### 5 releases

Uses old Rust 2015

0.1.4 | Aug 13, 2019 |
---|---|

0.1.3 | Jul 15, 2018 |

0.1.2 | Jul 8, 2018 |

0.1.1 | Jul 8, 2018 |

0.1.0 | Jul 8, 2018 |

#**1257** in Algorithms

**WTFPL**license

27KB

427 lines

# Ergothic

Ergothic is a collection of helpers for setting up and running distributed statistical Monte-Carlo simulations written in Rust. It is multi-purpose and will work for any statistical simulation based on Monte-Carlo or its variations (Metropolis-Hastings, etc.). However, its primary purpose is to ease the toil for simulating Quantum Field Theory on the lattice.

Ergothic will perform routine tasks unrelated to the subject of your research for you, allowing you to focus on the code that matters. Simulations written with ergothic can run both in the single-threaded local environment, which is super easy to debug, and on clusters with tens of thousands of nodes. No code changes are required to scale your simulation up to any number, that technical part has already been taken care of for you!

## Basic tutorial

Ergothic provides a very simple API that you can use to set up your simulation. You only need to know about the following concepts.

### Samples

**A sample** is a point in the configuration space representing a possible configuration of the system.
For example, in lattice QCD a sample constitues an assignment of SU(3) group elements called the holonomies to link of the lattice.

#### In code

Create a data type representing a sample configuration in your simulation:

`struct` `MySample` `{`
...
`}`

Implement the trait

for your sample. You will need to implement the following 3 methods:`ergothic ::`Sample

`trait` `Sample` `{`
`fn` `prepare``(``)`` ``->` `Self``;`
`fn` `thermalize``(``&``mut` `self``)`` ``{` `...` `}`
`fn` `mutate``(``&``mut` `self``)``;`
`}`

The meaning of those methods is discussed in what follows.

### Mutation

**Mutation** is the core operation which drives any simulation in ergothic.
Mutation changes your sample by randomizing its degrees of freedom, such that a crucial property called *ergodicity* holds:

The probability density of observing a system in a particular sample configuration, averaged over time, is equal to the probability density of the statistical ensemble. The latter is a parameter of the simulation, and can usually be inferred from the underlying physics. For example, in lattice QCD this is the exponential of minus the Euclidean (Wick-rotated) Wilson action.

#### In code

Implement the

method of the `mutate`

trait.
It is absolutely crucial that the algorithm that you are using in `Sample`

generates samples with the correct probability density.`mutate`

### Preparation & thermalization

Typically, recently initialized samples will be highly atypical. This is because the initialization logic usually doesn't know about the probability density function. It simply populates the fields of the sample configuration with zeroes or random values.

Getting rid of the initialization bias is known as **thermalization**.
Usually it can be implemented by applying a fixed number (10-20) of mutations to the sample.
However, ergothic lets you implement your own thermalization algorithm.

#### In code

Implement the

method of the `prepare`

trait.`Sample`

Optionally, you can implement the

method. The default implementation applies `thermalize`

20 times.`mutate`

### Measures

**Measures** are statistical counters corresponding to the physical observables.
The purpose of any ergothic simulation is to establish expectation values and statistical uncertainties for a given list of measures.

#### In code

Create a

and add measures to it. All measures must be given unique human-readable names.`Simulation`

`fn` `main``(``)`` ``{`
`let` `mut` simulation `=` `ergothic``::``Simulation``::`new`(``"`Lattice QCD`"``)``;`
`let` ground_state_energy `=` simulation`.``add_measure``(``"`Energy of the ground state`"``)``;`
`...`
`}`

### Measurement function

When your simulation runs, on each step you have a sample configuration. Measuring the values of physical observables of interest and accumulating those values in the statistical counters is done by the measurement function.

#### In code

Pass a lambda to the entry-point function

.`ergothic ::`

`Simulation`run

`::``simulation``.``run``(``|``s``:` `&`MyState`,` `ms``|` `{`
`//` Calculate the values of relevant observables in state `s` and accumulate them in `ms`.
`//` Accumulating values is easy: just call
`//` ms.accumulate(ground_state_energy, value);
`//` where `value` is computed using the sample configuration `s`.
`}``)``;`

## Example

Let's put everything together and write a simple simulation.
Our simulation will compute the mean values of

and `x`

where `x ^2`

`x`

is uniformly distributed within `[``0` `..` `1``]`

.`extern` `crate` ergothic`;`
`extern` `crate` rand`;`
`struct` `MySample` `{`
`x``:` `f64`, `//` Random variable within [0 .. 1].
`}`
`impl` `ergothic``::`Sample `for`` ``MySample` `{`
`fn` `prepare``(``)`` ``->` MySample `{`
MySample`{`x`:` `rand``::`random`(``)``}`
`}`
`fn` `mutate``(``&``mut` `self``)`` ``{`
`self``.`x `=` `rand``::`random`(``)``;`
`}`
`}`
`fn` `main``(``)`` ``{`
`let` `mut` simulation `=` `ergothic``::``Simulation``::`new`(`
`"`Computing expectations of random variable and its square`"``)``;`
`let` x `=` simulation`.``add_measure``(``"`Mean X`"``)``;` `//` Mean value of the random variable x.
`let` x2 `=` simulation`.``add_measure``(``"`Mean X^2`"``)``;` `//` Mean value of the square of x.
simulation`.``run``(``|``s``:` `&`MySample`,` `ms``|` `{`
ms`.``accumulate``(`x`,` s`.`x`)``;` `//` Accumulate the value of x in the statistical counter for the corresponding measure.
ms`.``accumulate``(`x2`,` s`.`x`.``powi``(``2``)``)``;` `//` Accumulate the value of x^2 in the statistical counter for the corresponding measure.
`}``)``;`
`}`

That's it! That simple code is fully functional, and it can run on clusters with tens of thousands of nodes, too!

### Running the example

To run the example in debug mode, simply run the following command:

`cargo`` run`

That's it!

### Example output

In debug mode, an ergothic simulation will output a table of measured values each 2 seconds. Here's a table for our example from above:

`Simulation uptime``:` `5` secs
Samples processed`:` `4839379`
Aggregate values`:`
`+``-``-``-``-``-``-``-``-``-``-``+``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``+``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``+``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``+`
`|` `MEASURE` `|` `EXPECTATION` `|` `UNCERTAINTY` `|` `RELATIVE` `UNCERTAINTY` `|`
`+``-``-``-``-``-``-``-``-``-``-``+``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``+``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``+``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``+`
`|` Mean X `|` `0.``4999631317520213` `|` `0.``00013121218432651647` `|` `0.``000262443720333356` `|`
`|` Mean X`^``2` `|` `0.``3332809661876769` `|` `0.``0001355082806320386` `|` `0.``0004065887175679` `|`
`+``-``-``-``-``-``-``-``-``-``-``+``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``+``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``+``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``+`

We see that the expectations of X and X^2 are what we would expect from taking the integrals analytically (1/2 and 1/3 respectively). Statistical uncertainties are of order 0.03% and 0.04% respectively after processing ~5 million samples.

## Scaling up

Now we want to fully take advantage of the huge computational resources which belong to our university / software company / cloud provider / etc.

### Computational model

Ergothic uses the "embarrassingly parallel" computational model. All nodes participating in the simulation are doing the same thing – producing data points and sending those to the storage service. This model has many advantages – it is super simple, scales perfectly, doesn't have any bottlenecks because the nodes don't communicate with each other.

You can analyze the intermediate (or final) results of your simulation by querying the database directly using the

command line tool.`ergothic_cli`

### Setting up MongoDB

Ergothic needs a data sink where the data points will be exported to. Currently, the only supported type of data sinks is MongoDB, though implementing other exporters should be easy and is among the next goals for ergothic.

### Shipping the simulation

Build the optimized version of same code with

`cargo`` build`` --`release

Now you need a way to distribute the binary to the cluster nodes and run them. For example, you can bundle the binary into a Docker container, and orchestrate the computation with Kubernetes.

Run the containers with the following command line arguments:

`./my_simulation`` --`production` --`mongo mongodb://hostname1:port2`[`,hostname2:port2,...`]`` --`mongo_db ergothic_data` --`mongo_coll my_simulation

Where:

*hostnames*should resolve to the host running MongoDB nodes.*ports*should correspond to the exposed MongoDB ports.*--mongo_db*is the name of the database to send data points to.*--mongo_coll*is the name of the collection to send data points to.

In production mode, every node will produce a data point every ~5 min. Data points will get accumulated in the database.

### Analyzing the results

As the simulation runs, data points are accumulated in the database. This section describes how to query the database for aggregate values and uncertainties.

**TODO:** after the

tool is implemented, explain how to use it to analyze and manipulate the results.`ergothic_cli`

## Remains to be done

Checklist of the most important features that are currently missing from ergothic:

- Data analyzer tool

for querying and aggregating the data points.`ergothic_cli` - Multithreaded jobs – ability to scale the simulation into NxM threads where N is the number of nodes and M is the number of threads per node. Threads shouldn't communicate with each other, the computational model remains embarassingly parallel.
- Exporting to local files and other databases – exporters to other formats.

#### Dependencies

~16MB

~281K SLoC