4 releases
0.4.0 | Feb 14, 2021 |
---|---|
0.3.2 | Dec 14, 2020 |
0.3.1 | Dec 14, 2020 |
0.3.0 | Dec 13, 2020 |
#365 in Science
175KB
3.5K
SLoC
BoolNetEvo
Evolve populations of boolean networks to approximate bitstring functions and their (unknown) inverses.
Machinery (click to show/hide)
Layer k+1 ● ●
↑ ┌───────┼───────┐ ┌───────┼───────┐
↑ ○ ○ ○ ○ ○ ○
↑ / | \ / | \ / | \ / | \ / | \ / | \
Layer k ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Boolean network, or boolnet, is a well-known particular case of an artificial neural network where inputs/outputs of each cell, or boolon, are bits — 0 (false) and 1 (true) — rather than values from a range, say, (-∞; +∞). Output of a boolon is a boolean function of its inputs, which are outputs of boolons from previous layer.
In this implementation, boolean functions have 3 variables and are encoded by u8
values, which represent the truth tables of such functions. Each boolon has input tree, whose leaves are its input boolons, and then, at each level of that tree, a boolean function of 3 values at current level gives 1 value at next level. Single value at root level is the output of this boolon. On the figure above, input tree has height 2, so each boolon's output is affected by 32 = 9 outputs of boolons from previous layer (some may coincide).
Number of boolons is the same in all layers, and the signal can propagate through a boolnet for few times, called cycles: at the end of each cycle, outputs of the last layer become inputs for the first layer. Under these constraints, layers can be viewed as steps residing in time rather than space, if you identify i-th boolon of all layers, which changes its input tree at every step within cycle.
Layers may have rands boolons whose outputs are random. Enter robustness and free (unpredictable) will... at the cost of slower learning.
Array of 256 possible 3-variable boolean functions is used to perform simultaneous calculation for 128 data flows, in a batch.
Several boolnets of the same architecture make a population, which evolves in time by means of the genetic algorithm. At each epoch:
-
all boolnets are scored w.r.t. how they calculate some bitstring function or its inverse, on the average (the more wrong bits, the less the score),
-
boolnets with the lowest scores are replaced by inheritants of boolnets with the highest scores.
Inheritance, as usually, is crossover and mutation, in this case for 1) indices of input boolons and 2) boolean functions in the input tree, for each boolon.
Note that a single boolnet does not evolve; a population does.
The purpose of the crate is to provide controls for this process and make it more interactive.
Quick Start
Warning: use release
profile, without optimizations the execution is 20–30 times slower.
In your code.rs
, bring BitransformRegister
and Evolver
structs into scope:
extern crate boolnetevo;
use boolnetevo::{
BitransformRegister,
Evolver
};
and create their default instances:
fn app() -> Result<(), String> {
let register = BitransformRegister::new_default();
let mut evolver = Evolver::new_default(®ister)?;
...
}
Now, you can run the built-in shell...
evolver.shell(®ister)?;
and get something like this (with colors) on your terminal:
BITRANSFORM: InvXor { direct: Xor }
POPULATION: 250
ARCHITECTURE: height = 1, size = 48, rands = 2, layers = 1, cycles = 1
PARAMETERS: batches = 8, replace_ratio = 0.7, par_ratio = 0.8, parents = 2, par_sw_prob = 0.9, mutat_prob = 0.1
SETTINGS: log off, test new, print improve
Type "?" to see the list of available commands...
$ evol
Press any key to stop...
Epoch 1 ( 41 ms): min = -9.0586, average = -7.9585, max = -6.8008
Epoch 2 ( 29 ms): min = -8.9746, average = -7.7293, max = -6.7021
Epoch 3 ( 29 ms): min = -8.9043, average = -7.5695, max = -6.4512
Epoch 4 ( 18 ms): min = -8.9697, average = -7.4082, max = -5.7324
Epoch 6 ( 17 ms): min = -8.7197, average = -7.1513, max = -5.7129
...
Epoch 58 ( 29 ms): min = -5.0742, average = -1.9552, max = -0.4736
Epoch 64 ( 16 ms): min = -4.5156, average = -1.8883, max = -0.2500
Epoch 65 ( 17 ms): min = -5.0068, average = -1.8879, max = 0.0000
BITRANSFORM: InvXor { direct: Xor }
POPULATION: 250
ARCHITECTURE: height = 1, size = 48, rands = 2, layers = 1, cycles = 1
PARAMETERS: batches = 8, replace_ratio = 0.7, par_ratio = 0.8, parents = 2, par_sw_prob = 0.9, mutat_prob = 0.1
SETTINGS: log off, test new, print improve
$ run 0 a7b8
e9924e2a
(To verify that 0xe992 XOR 0x4e2a = 0xa7b8
,
$ dir e9924e2a
a7b8
or use any calculator in programming mode.)
...OR you can call Evolver
's methods directly:
evolver.bitran(®ister, "inv-sha1", &[1, 4])?; // 1 round of SHA-1 with 4-byte message
evolver.arch(2, 240, 8, 2, 1)?;
evolver.par("mutat_prob", "0.2")?;
evolver.evol(100)?;
evolver.par("mutat_prob", "0.1")?;
evolver.evol(50)?;
evolver.save("evolvers/invsha1-demo")?;
See src/bin/shell.rs
and src/bin/script.rs
.
Custom Bitransforms
The evolution optimizes a population towards precise calculation of some bitstring function. The related methods — constructing the function's parameters block, providing input and output size in bits, scoring the input-output batch pair in regard to the function — are gathered in Bitransform
trait. The crate implements it for xor
, add
, mult
, CRC32
, MD5
, SHA1
, SHA2
, SHA3/Keccak
functions and their inverses, and you can implement it for your own functions.
As an example, see src/bitransform/xor.rs
and src/bitransform/sha1.rs
. The difference between direct and inverse bitransforms is that in the former case, the function of input is compared bitwise with the output, while in the latter case, the same function of output is compared with the input. In other words, you do not need explicit inverse function to score something that tries to perform inversion... actually, "something" should become an inverter.
Assume you have done it for Func
in func
module, then you add it to register:
mod func;
use func::{Func, InvFunc};
fn app() -> Result<(), String> {
let mut register = BitransformRegister::new_default();
register.add(vec![
Func::reg_entry(), InvFunc::reg_entry()
])?;
...
}
and now you can use it with evolver:
let mut evolver = Evolver::new_default(®ister)?;
evolver.bitran(®ister, "inv-func", &[7, 5, 3])?;
evolver.evol(1000)?;
(Here 7, 5, 3
are your function's parameters.)
Modifying the crate
Well, copy its source and do whatever you like: extend architecture, apply another genetic algorithm, optimize performance, ... It is something else then, and you should give it a different name.
Kin
Some resembling frameworks, packages, projects:
There are probably more... When searching for those, keep in mind that the "evolution" term in this area sometimes means change of a single boolnet state in time rather than optimization of a boolnet population.
Ashievements
So far, well... While populations relatively quickly learn to invert simple bitstring functions like bitwise XOR, they struggle with more complex ones, especially those created to prevent inversion, like rounds of cryptographic hashes SHA-x. Even ADD, due to carry, requires more "advanced" boolnet architecture (more layers etc.), which makes learning significantly slower. There seem to be non-zero bounds for the best score, and, of course, there is no guarantee to reach score 0, that is, to breed a boolnet that always calculates a function precisely.
There may be a proof somewhere that this approach has insurmountable limitations when a function's complexity exceeds certain threshold, and then to apply it would be like trying to make a bridge of ashes. On the other hand, it can be a part of some more sophisticated scheme.
A glitch
Case A. Set the following evolver:
- bitransform is
inv-sha2
with 64 rounds (i.e. full SHA2-256) and message length 60 bytes (480 bits) ORinv-keccak
with hash length 256, 24 rounds (i.e. full SHA3-256), and the same message length; - population is 500;
- architecture is
height = 1
,size = 8000
(yes, most will be unused),rands = 0
,layers = 1
(single layer...),cycles = 1
; - evolution parameters are default except for
batches = 16
(orbatches = 32
) andmutat_prob = 0.001
; - settings: change
test
toall
andprint
toall
.
Now, start the evolution, $ e
in shell, and watch the average
score column. It remains very close to -128.0
however many epochs pass, you should never get more than -127.95
or so (the more the batches
, the more precision the law of large numbers provides).
This outcome is quite expectable: not a single network of such simple architecture, — where each output bit is a function of 3 or less hash bits, — is able to invert these preimage-resistant cryptographic hashes statistically better than guessing a message (and thus its hash) randomly. The population as a whole shows the absence of improvement as well, of course.
Case B. Same as Case A with a single difference: decrease message length to 40 bytes (320 bits — notice it is still larger than hash length); do not forget to reset
the population after. Now, start the evolution again. After 100 or so epochs, the average
score increases to nearly -127.0
and remains there. It seemingly begins to improve at 20–30th epoch, when some networks gain (?) a very, very tiny ability to invert hashes better (?) than a random guessing; then their descendants spread in the population.
Reset populations and repeat evolutions from Cases A and B several times to verify their stability. Or try it for SHA1, MD5 with similar results.
What would that 1 bit (which is >10σ-deviation from mean of approximated normal distribution of the score) mean, were it not the header of this section... Hint: consider the inverter that, given a hash, tries several (even 2) fixed messages and selects one whose hash has maximum number of coinciding bits.
Dependencies
~4.5–6.5MB
~115K SLoC