5 releases
0.2.6 | Mar 30, 2021 |
---|---|
0.2.3 | Dec 17, 2020 |
0.2.2 | Dec 17, 2020 |
0.2.1 | Oct 18, 2020 |
0.2.0 | Aug 27, 2020 |
#2210 in Command line utilities
53KB
1K
SLoC
es-public-proxy: simple read-only HTTP reverse-proxy for exposing an Elasticsearch node to the public internet
- type-safe de-serialization and re-serialization of all user data
- single-binary, easy to install
- simple configuration with sane defaults
- low-overhead in network latency and compute resources
- optional CORS headers for direct browser requests
- SSL, transport compression, load-balancing, observability, and rate-limiting are left to other tools like nginx, caddy, or HAproxy
- free software forever: AGPLv3+ license
The Elasticsearch REST API is powerful, well documented, and has client library
implementations for many programming languages. For datasets and services which
contain only public information, it would be convenient to provide direct
access to at least a subset of the API for anybody to take advantage of. The
Elasticsearch maintainers warn against this behavior, on the basis that the API
is not designed for public use. Recent versions of Elasticsearch have an
authentication/authorization subsystem, and there are third-party plugins for
read-only access (such as ReadonlyREST), but these
solutions require careful configuration and knowledge of which endpoints are
"safe" for users. Elasticsearch accepts request bodies on GET
requests, and
one proposed solution is to filter to only GET
requests using a reverse proxy
like nginx. However, some safe endpoints (such as deleting scroll objects)
require other HTTP verbs, and most browsers do not support GET
bodies, so
this is only a partial hack.
es-public-proxy
is intended to be a simple and reliable alternative for the
use case of exposing popular search queries on specific indices to the public
web. HTTP requests are parsed and filtered in a safe, compiled language (Rust),
then only safe queries are re-serialized and forwarded to the backend search
instance listening on a different port.
Note that of course clients can still submit "expensive" queries of various
kinds which will slow down the host. Some of these can be disabled in
the elasticsearch configuration (this would disable those queries for all
connections, not just via the proxy). Some query types are simply not supported
by this proxy. In the future the proxy could gain configruation parameters and
smarter parsing of some query types (like query_string
) to try and prevent
even more expensive queries.
Installation
On Debian/Ubuntu Linux systems, the easiest way to get started is to download
and install an unsigned .deb
from
https://archive.org/download/es-public-proxy-deb. This will include a
manpage, configuration file, and systemd unit file. After installing, edit the
configuration file (/etc/es-public-proxy.toml
) and start the service like:
sudo systemctl start es-public-proxy
sudo systemctl enable es-public-proxy
On other platforms you can install and run on a per-user basis using the rust toolchain with:
cargo install es-public-proxy
es-public-proxy --example-config > example.toml
# edit the configuration file
es-public-proxy --config example.toml
There is also a Dockerfile, but it isn't actively used and hasn't been pushed to any image repository. Eg, unsure how best to inject configuration into a docker image. You can build the image with:
docker build -f extra/Dockerfile .
Configuration
In all cases you will want to explicitly enumerate all of the indices to have
public access. There is an unsafe_all_indices
intended for prototyping, but
this may allow access to additional non-index API endpoints.
One simple deployment pattern is to put nginx
, es-public-proxy
, and
elasticsearch
all on the same server. In this configuration, nginx
would
listen on all network interfaces on ports 80 and 443, and handle SSL upgrade
redirects from 80 to 443, as well as add transport compression, restrict client
body payload limits, etc. es-public-proxy
would listen on localhost port
9292, and connect back to elasticsearch on localhost port 9200.
Limitations
Not all of the elasticsearch API has been implemented yet. In general, this service is likely to be more strict in parsing and corner-cases. For example:
- URL query parameters like
?human
must be expanded into a boolean like?human=true
- Some cases where elasticsearch will allow short-cutting a full object into a string, this proxy requires the full object format
- index patterns in configuration are not supported
Development
To build this package you need the rust toolchain installed. We target stable Rust, 2018 edition, version 1.45+.
Re-compiling the manpage requires scdoc.
Building a Debian package (.deb
) requires the cargo-deb
plugin, which you
can install with: cargo install cargo-deb
A Makefile is included to wrap common development commands, for example:
make test
make lint
make deb
Contributions are welcome! Would prefer to keep the number of dependant crates
low (eg, don't currently use a CLI argument parsing library), but open to
discussion. When sending patches or merge requests, it is helpful (but not
required) if you can include test coverage, re-run cargo fmt
, and acknowledge
the license terms ahead of time.
Dependencies
~9–20MB
~288K SLoC