### 5 releases

0.2.2 | May 12, 2019 |
---|---|

0.2.1 | Mar 6, 2019 |

0.2.0 | Feb 7, 2019 |

0.1.1 | Jan 25, 2019 |

0.1.0 | Dec 16, 2018 |

#**62** in Machine learning

**30** downloads per month

**MIT**license

99KB

1.5K
SLoC

# F-BLEAU

F-BLEAU is a tool for estimating the leakage of a system about its secrets in a black-box manner (i.e., by only looking at examples of secret inputs and respective outputs). It considers a generic system as a black-box, taking secret inputs and returning outputs accordingly, and it measures how much the outputs "leak" about the inputs. It was proposed in [2].

F-BLEAU is based on the equivalence between estimating the error of a Machine Learning model of a specific class and the estimation of information leakage [1,2,3].

This code was also used for the experiments of [2] on the following evaluations: Gowalla, e-passport, and side channel attack to finite field exponentiation.

# Getting started

F-BLEAU takes as input CSV data containing examples of system's inputs
and outputs.
It currently requires two CSV files as input: a *training* file and a
*validation* (or *test*) file, such as:

`0``,` `0.``1``,` `2.``43``,` `1.``1`
`1``,` `0.``0``,` `1.``22``,` `1.``1`
`1``,` `1.``0``,` `1.``02``,` `0.``1`
`...`

where the first column specifies the secret, and the remaining ones indicate the output vector.

It runs a chosen method for estimating the Bayes risk (smallest probability of error of an adversary at predicting a secret given the respective output), and relative security measures.

The general syntax is:

`fbleau ``<`estimate`>` `[`options`]` `<`train`>` `<`test`>`

## Estimates

Currently available estimates:

**log** k-NN estimate, with

, where `k = ln(n)`

`n`

is the number of training
examples.**log 10** k-NN estimate, with

, where `k = log10(n)`

`n`

is the number of
training examples.**frequentist** (or "lookup table") Standard estimate. Note that this
is only applicable when the outputs are finite; also, it does not scale
well to large systems (e.g., large input/output spaces).

Bounds and other estimates:

**nn-bound** Produces a lower bound of R* discovered by Cover and Hard ('67),
which is based on the error of the NN classifier (1-NN).

**--knn** Runs the k-NN classifier for a fixed k to be specified.
Note that this *does not* guarantee convergence to the Bayes risk.

## Example

This example considers 100K observations generated according to a
Geometric distribution with privacy level

(see [2] for details);
the true value of the Bayes risk is `nu =4`

`R``*``=``0.``456`

computed analytically.
The observations are split into training (80%) and test sets
(`examples``/`geometric`-``4.`train`.`csv

and `examples``/`geometric`-``4.`test`.`csv

respectively).One can run

to compute the `fbleau`

estimate as follows:`log`

`$`` fbleau log examples/geometric-4.train.csv examples/geometric-4.test.csv`
`mapped`` vectors into 191 unique IDs`
`mapped`` vectors into 191 unique IDs`
`Random`` guessing error: 0.913`
`Estimating`` leakage measures...`
`Final`` estimate: 0.47475`
`Multiplicative`` Leakage: 6.037356321839082`
`Additive`` Leakage: 0.43825000000000003`
`Bayes`` security measure: 0.5199890470974808`
`Min-entropy`` Leakage: 2.593916950824318`
`Minimum`` estimate: 0.47355`
`Multiplicative`` Leakage: 6.051149425287359`
`Additive`` Leakage: 0.43945`
`Bayes`` security measure: 0.5186746987951807`
`Min-entropy`` Leakage: 2.5972092105949125`

NOTE: depending on your machine's specs this may take a while.

is designed to effectively exploit many CPUs, albeit with low RAM requirements;
further optimisations are in the works.`fbleau`

One should look at the

(i.e., the minimum value that
the Bayes risk estimate took as the size of the training examples increases),
rather than at the `Minimum estimate`

: indeed, the estimates do not guarantee
a monotonically decreasing behaviour.`Final estimate`

For the

estimate:`frequentist`

`$`` fbleau frequentist examples/geometric-4.train.csv examples/geometric-4.test.csv`
`mapped`` vectors into 191 unique IDs`
`mapped`` vectors into 191 unique IDs`
`Random`` guessing error: 0.913`
`Estimating`` leakage measures...`
`Final`` estimate: 0.5621`
`Multiplicative`` Leakage: 5.033333333333335`
`Additive`` Leakage: 0.3509`
`Bayes`` security measure: 0.6156626506024097`
`Min-entropy`` Leakage: 2.3315141437165607`
`Minimum`` estimate: 0.56205`
`Multiplicative`` Leakage: 5.033908045977013`
`Additive`` Leakage: 0.35095`
`Bayes`` security measure: 0.6156078860898139`
`Min-entropy`` Leakage: 2.3316788631368333`

## Further options

It is often useful to know the value of an estimate at every step
(i.e., for training size 1, 2, ...).

can output this into a file specified by `fbleau`

.`--verbose= <logfile>`

By default,

runs for all training data.
However, one can specify a stopping criterion, in the form of a
(delta, q)-convergence: `fbleau`

stops when the estimate's value has
not changed more than delta (`fbleau`

), either in relative (default) or
absolute (`--delta`

) sense, for at least q steps (`--absolute`

).`--qstop`

can scale the individual values of the system's output ("features")
in the `fbleau`

interval by specifying the `[``0``,``1``]`

flag.`--scale`

By default,

uses a number of threads equal to the number of CPUs.
To limit this number, you can use `fbleau`

.`--nprocs`

An option

is available to select the desired distance metric
for nearest neighbor methods.`--distance`

Further options are shown in the help page:

`fbleau`` -`h

# Install

The code is written in

, but it is thought to be used as a
standalone command line tool.
Bindings to other programming languages (e.g., Python) may happen in the
future.`Rust`

Install rustup, which will make

available
on your path.
Then run:`cargo`

`cargo`` install fbleau`

You should now find the binary

in your `fbleau`

(if not,
check out rustup again).`$``PATH`

If

is not available on your system (e.g., some *BSD systems),
you should still be able to install `rustup`

with the system's
package manager, and then install `cargo`

as above.
If this doesn't work, please open a ticket.`fbleau`

## TODO

Currently, the code provided here:

- is based on frequentist and nearest neighbor methods; in the future we hope to extend this to other ML methods; note that this does not affect the generality of the results, which hold against any classifier,
- computes one estimate at the time (i.e., to compute multiple estimates one
needs to run

several times); this can change in the future.`fbleau`

### Short term

- return various leakage measures (instead of just R*)
- resubstitution estimate

### Mid term

- predictions for multiple estimators at the same time
- get training data from standard input (on-line mode)

### Maybe

- other ML methods (e.g., SVM, neural network)
- Python and Java bindings

# References

[1] 2017, "Bayes, not Naïve: Security Bounds on Website Fingerprinting Defenses". *Giovanni Cherubin*

[2] 2018, "F-BLEAU: Practical Channel Leakage Estimation". *Giovanni Cherubin, Konstantinos Chatzikokolakis, Catuscia Palamidessi*.

[3] (Blog) "Machine Learning methods for Quantifying the Security of Black-boxes". https://giocher.com/pages/bayes.html

#### Dependencies

~7MB

~121K SLoC