28 releases (12 breaking)
new 0.13.0 | Sep 11, 2024 |
---|---|
0.11.1 | Sep 7, 2024 |
0.7.2 | Jul 27, 2024 |
0.7.0 | May 25, 2023 |
0.5.0 | Jul 7, 2022 |
#85 in Algorithms
2,295 downloads per month
Used in genetic_algorithm_meta
340KB
7K
SLoC
genetic-algorithm
A genetic algorithm implementation for Rust. Inspired by the book Genetic Algorithms in Elixir
There are three main elements to this approach:
- The Genotype (the search space)
- The Fitness function (the search goal)
- The strategy (the search strategy)
- Evolve (evolution strategy)
- Permutate (for small search spaces, with a 100% guarantee)
- HillClimb (when search space is convex with little local optima or when crossover is impossible/inefficient)
Terminology:
- Population: a population has
population_size
number of individuals (called chromosomes). - Chromosome: a chromosome has
genes_size
number of genes - Allele: alleles are the possible values of the genes
- Gene: a gene is a combination of position in the chromosome and value of the gene (allele)
- Genes: storage trait of the genes for a chromosome, mostly
Vec<Allele>
, but alternatives possible - Genotype: Knows how to generate, mutate and crossover chromosomes efficiently
- Fitness: knows how to determine the fitness of a chromosome
All multithreading mechanisms are implemented using rayon::iter and std::sync::mpsc.
Documentation
See docs.rs
Quick Usage
use genetic_algorithm::strategy::evolve::prelude::*;
// the search space
let genotype = BinaryGenotype::builder() // boolean alleles
.with_genes_size(100) // 100 genes per chromosome
.build()
.unwrap();
println!("{}", genotype);
// the search goal to optimize towards (maximize or minimize)
#[derive(Clone, Debug)]
pub struct CountTrue;
impl Fitness for CountTrue {
type Genotype = BinaryGenotype; // Genes = Vec<bool>
fn calculate_for_chromosome(&mut self, chromosome: &Chromosome<Self::Genotype>) -> Option<FitnessValue> {
Some(chromosome.genes.iter().filter(|&value| *value).count() as FitnessValue)
}
}
}
// the search strategy
let evolve = Evolve::builder()
.with_genotype(genotype)
.with_select(SelectElite::new(0.9)) // sort the chromosomes by fitness to determine crossover order and select 90% of the population for crossover (drop 10% of population)
.with_crossover(CrossoverUniform::new()) // crossover all individual genes between 2 chromosomes for offspring (and restore back to 100% of target population size by keeping the best parents alive)
.with_mutate(MutateSingleGene::new(0.2)) // mutate offspring for a single gene with a 20% probability per chromosome
.with_fitness(CountTrue) // count the number of true values in the chromosomes
.with_target_population_size(100) // evolve with 100 chromosomes
.with_target_fitness_score(100) // goal is 100 times true in the best chromosome
.with_reporter(EvolveReporterSimple::new(100)) // optional builder step, report every 100 generations
.call();
.unwrap()
println!("{}", evolve);
Examples
Run with cargo run --example [EXAMPLE_BASENAME] --release
- N-Queens puzzle https://en.wikipedia.org/wiki/Eight_queens_puzzle.
- See examples/evolve_nqueens
- See examples/hill_climb_nqueens
UniqueGenotype<u8>
with a 64x64 chess board setup- custom
NQueensFitness
fitness
- Knapsack problem: https://en.wikipedia.org/wiki/Knapsack_problem
- See examples/evolve_knapsack
- See examples/permutate_knapsack
BinaryGenotype<Item(weight, value)>
each gene encodes presence in the knapsack- custom
KnapsackFitness(&items, weight_limit)
fitness
- Infinite Monkey theorem: https://en.wikipedia.org/wiki/Infinite_monkey_theorem
- See examples/evolve_monkeys
ListGenotype<char>
100 monkeys randomly typing characters in a loop- custom fitness using hamming distance
- Permutation strategy instead of Evolve strategy for small search spaces, with a 100% guarantee
- HillClimb strategy instead of Evolve strategy, when crossover is impossible or inefficient
- Explore vector genes (BinaryGenotype) versus other storage (BitGenotype)
- Explore internal and external multithreading options
- Custom Fitness function with LRU cache
- See examples/evolve_binary_lru_cache_fitness
- Note: doesn't help performance much in this case...
- Custom Reporting implementation
Performance considerations
For the Evolve strategy:
- Select: no considerations. All selects are basically some form of in-place sorting of some kind. This is relatively fast compared to the rest of the operations.
- Crossover: the workhorse of internal parts. Crossover touches most genes each generation and clones up to the whole population to restore lost population size in selection. See performance tips below.
- Mutate: no considerations. It touches genes like crossover does, but should be used sparingly anyway; with low gene counts (<10%) and low probability (5-20%)
- Fitness: can be anything. This fully depends on the user domain. Parallelize
it using
with_par_fitness()
in the Builder. But beware that parallelization has it's own overhead and is not always faster.
Performance Tips
- Small genes sizes
- It seems that CrossoverMultiGene with
number_of_crossovers = genes_size / 2
andallow_duplicates = true
is the best tradeoff between performance and effect. CrossoverUniform is an alias for the same approach, taking the genes_size from the genotype at runtime. - Restoring the population doesn't matter that much as the cloning is relatively less pronounced (but becomes more prominent for larger population sizes)
- It seems that CrossoverMultiGene with
- Large genes sizes
- It seems that CrossoverMultiPoint with
number_of_crossovers = genes_size / 9
andallow_duplicates = false
is the best tradeoff between performance and effect. - Restoring the population has major performance effects and should be avoided. Use a high selection_rate or even 100%, so there is little parent cloning. Explore non-Vec based genotypes like BitGenotype.
- It seems that CrossoverMultiPoint with
Tests
Run tests with cargo test
Use .with_rng_seed_from_u64(0)
builder step to create deterministic tests results.
Benchmarks
Implemented using criterion. Run benchmarks with cargo bench
Profiling
Implemented using criterion and pprof.
Uncomment in Cargo.toml
[profile.release]
debug = 1
Run with cargo run --example profile_evolve_binary --release -- --bench --profile-time 5
Find the flamegraph in: ./target/criterion/profile_evolve_binary/profile/flamegraph.svg
TODO
MAYBE
- Target cardinality range for Mutate Dynamic to avoid constant switching
- Default max_stale_generations to 1 for SteepestAscent
- Add scaling permutate? Can be done by grid search and then search within last grid with new scale
- Add scaling helper function
- Add simulated annealing strategy
- Add Roulette selection with and without duplicates (with fitness ordering)
- Add OrderOne crossover for UniqueGenotype?
- Add WholeArithmetic crossover for RangeGenotype?
- Add CountTrueWithWork instead of CountTrueWithSleep for better benchmarks?
- Explore more non-Vec genes:
- PackedSimd, ArrayVec, TinyVec?
- Add MatrixGenotype which stores a
population x genes_size
matrix and keeps the genes internal to the Genotype. Chromosomes just reference the ID using reference-id. Population can be handled very lightly as the Chromosome is just a light pointer. Use nalgebra (2D)? Make sure to allow a dealloc call for chromosomes which are dropped from the population (send the population, and crosscheck reference-ids?)
- Maybe use TinyVec for Population? (it us usually less than 1000 anyway), maybe useful paired with MatrixGenotype, where the chromosomes are lightweight (and Copyable)
ISSUES
- permutate (and possibly others) with gene_size 0 panics. Maybe it should just return a empty chromosome?
Dependencies
~4.5MB
~82K SLoC