1 unstable release
0.1.0 | Jan 31, 2023 |
---|
#328 in Simulation
36KB
829 lines
Network Partition Simulator
Use cases
- To test the operation of a distributed protocol under unstable (but controlled) network conditions.
- To test the behaviour of a web-service under flaky network conditions. (Example: disabling/enabling network connectivity to the server).
Architecture
Individual node
s work as docker containers and form a cluster of test nodes. They register their address and ports with Consul
when they're ready
to serve requests. The nodes also register an http healthcheck with Consul so that they can be monitored periodically.
The supervisor
node is the orchestrator node that tweaks the iptables
rules on the whole test cluster. The supervisor loads the state of the cluster
by querying Consul. Once the cluster under test is loaded by the supervisor (via /api/v1/load_cluster
), the supervisor is ready to serve the Partition API.
The supervisor understands the following network-related mutations and queries:
Partition
-POST api/v1/partition/<source_id>/<target_id>
: Given ids of a source and a target node, configure the firewall on the target node such that all packets coming from the source node are dropped.Heal
-POST api/v1/heal/<source_id>/<target_id>
: Given ids of a source and a target node, configure the firewall on the target node such that all packets coming from the source node are accepted.Restore
-POST api/v1/restore
: Clear all the firewall rules across the cluster so that all nodes can communicate with each other.Rules
-GET api/v1/rules/<target_id>
: Given id of a target node, list all theINPUT
rules currently configured on the target node.
Usage
Dockerize the system into a single process that will communicate with other docker containers whenever necessary. Ensure the system accepts http
healthchecks at /health
.
Consider using docker/test-node.Dockerfile
as reference to structure your Dockerfile:
FROM rust:1.66 AS build
WORKDIR /app
RUN apt-get update -y && apt-get upgrade -y
RUN apt-get install curl python3-venv openssh-client openssh-server iptables sudo -y
COPY register_service.py /register_service.py
RUN chmod +x /register_service.py
RUN python3 -m venv /var/venv/node
RUN /var/venv/node/bin/python -m pip install requests
RUN mkdir -p /etc/ssh
COPY docker/sshd_config /etc/ssh/sshd_config
# Application related config.
...
# Make sure the `register_service.py` script gets called before your applications boots up.
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
entrypoint.sh
usermod --password $(echo "root" | openssl passwd -1 -stdin) root
service ssh restart
/var/venv/node/bin/python3 /register_service.py --name "test-node" --port 9000
# Now start your long running application that uses port 9000 to communicate with its peers.
...
The docker/test-supervisor.Dockerfile
is to be used as is. Once the node's dockerfile is set up, start the docker-compose
deployment:
docker-compose up -d
The test-supervisor
container exposes an http api at port 3000
which can be used to control the network partitions in the test cluster.
The consul container is used solely for service discovery but if you're interested its UI can be accessed at port 8500
of the consul
container.
Dependencies
~14–26MB
~394K SLoC