28 releases (12 breaking)

new 0.13.0 Sep 11, 2024
0.11.1 Sep 7, 2024
0.7.2 Jul 27, 2024
0.7.0 May 25, 2023
0.5.0 Jul 7, 2022

#85 in Algorithms

Download history 105/week @ 2024-07-15 546/week @ 2024-07-22 158/week @ 2024-07-29 400/week @ 2024-08-05 321/week @ 2024-08-12 460/week @ 2024-08-19 754/week @ 2024-08-26 715/week @ 2024-09-02

2,295 downloads per month
Used in genetic_algorithm_meta

MIT/Apache

340KB
7K SLoC

genetic-algorithm

Crates.io MSRV Crates.io Version Rust Crates.io License

A genetic algorithm implementation for Rust. Inspired by the book Genetic Algorithms in Elixir

There are three main elements to this approach:

  • The Genotype (the search space)
  • The Fitness function (the search goal)
  • The strategy (the search strategy)
    • Evolve (evolution strategy)
    • Permutate (for small search spaces, with a 100% guarantee)
    • HillClimb (when search space is convex with little local optima or when crossover is impossible/inefficient)

Terminology:

  • Population: a population has population_size number of individuals (called chromosomes).
  • Chromosome: a chromosome has genes_size number of genes
  • Allele: alleles are the possible values of the genes
  • Gene: a gene is a combination of position in the chromosome and value of the gene (allele)
  • Genes: storage trait of the genes for a chromosome, mostly Vec<Allele>, but alternatives possible
  • Genotype: Knows how to generate, mutate and crossover chromosomes efficiently
  • Fitness: knows how to determine the fitness of a chromosome

All multithreading mechanisms are implemented using rayon::iter and std::sync::mpsc.

Documentation

See docs.rs

Quick Usage

use genetic_algorithm::strategy::evolve::prelude::*;

// the search space
let genotype = BinaryGenotype::builder() // boolean alleles
    .with_genes_size(100)                // 100 genes per chromosome
    .build()
    .unwrap();

println!("{}", genotype);

// the search goal to optimize towards (maximize or minimize)
#[derive(Clone, Debug)]
pub struct CountTrue;
impl Fitness for CountTrue {
   type Genotype = BinaryGenotype; // Genes = Vec<bool>
      fn calculate_for_chromosome(&mut self, chromosome: &Chromosome<Self::Genotype>) -> Option<FitnessValue> {
         Some(chromosome.genes.iter().filter(|&value| *value).count() as FitnessValue)
      }
   }
}

// the search strategy
let evolve = Evolve::builder()
    .with_genotype(genotype)
    .with_select(SelectElite::new(0.9))            // sort the chromosomes by fitness to determine crossover order and select 90% of the population for crossover (drop 10% of population)
    .with_crossover(CrossoverUniform::new())       // crossover all individual genes between 2 chromosomes for offspring (and restore back to 100% of target population size by keeping the best parents alive)
    .with_mutate(MutateSingleGene::new(0.2))       // mutate offspring for a single gene with a 20% probability per chromosome
    .with_fitness(CountTrue)                       // count the number of true values in the chromosomes
    .with_target_population_size(100)              // evolve with 100 chromosomes
    .with_target_fitness_score(100)                // goal is 100 times true in the best chromosome
    .with_reporter(EvolveReporterSimple::new(100)) // optional builder step, report every 100 generations
    .call();
    .unwrap()

println!("{}", evolve);

Examples

Run with cargo run --example [EXAMPLE_BASENAME] --release

Performance considerations

For the Evolve strategy:

  • Select: no considerations. All selects are basically some form of in-place sorting of some kind. This is relatively fast compared to the rest of the operations.
  • Crossover: the workhorse of internal parts. Crossover touches most genes each generation and clones up to the whole population to restore lost population size in selection. See performance tips below.
  • Mutate: no considerations. It touches genes like crossover does, but should be used sparingly anyway; with low gene counts (<10%) and low probability (5-20%)
  • Fitness: can be anything. This fully depends on the user domain. Parallelize it using with_par_fitness() in the Builder. But beware that parallelization has it's own overhead and is not always faster.

Performance Tips

  • Small genes sizes
    • It seems that CrossoverMultiGene with number_of_crossovers = genes_size / 2 and allow_duplicates = true is the best tradeoff between performance and effect. CrossoverUniform is an alias for the same approach, taking the genes_size from the genotype at runtime.
    • Restoring the population doesn't matter that much as the cloning is relatively less pronounced (but becomes more prominent for larger population sizes)
  • Large genes sizes
    • It seems that CrossoverMultiPoint with number_of_crossovers = genes_size / 9 and allow_duplicates = false is the best tradeoff between performance and effect.
    • Restoring the population has major performance effects and should be avoided. Use a high selection_rate or even 100%, so there is little parent cloning. Explore non-Vec based genotypes like BitGenotype.

Tests

Run tests with cargo test

Use .with_rng_seed_from_u64(0) builder step to create deterministic tests results.

Benchmarks

Implemented using criterion. Run benchmarks with cargo bench

Profiling

Implemented using criterion and pprof.

Uncomment in Cargo.toml

[profile.release]
debug = 1

Run with cargo run --example profile_evolve_binary --release -- --bench --profile-time 5

Find the flamegraph in: ./target/criterion/profile_evolve_binary/profile/flamegraph.svg

TODO

MAYBE

  • Target cardinality range for Mutate Dynamic to avoid constant switching
  • Default max_stale_generations to 1 for SteepestAscent
  • Add scaling permutate? Can be done by grid search and then search within last grid with new scale
  • Add scaling helper function
  • Add simulated annealing strategy
  • Add Roulette selection with and without duplicates (with fitness ordering)
  • Add OrderOne crossover for UniqueGenotype?
  • Add WholeArithmetic crossover for RangeGenotype?
  • Add CountTrueWithWork instead of CountTrueWithSleep for better benchmarks?
  • Explore more non-Vec genes:
    • PackedSimd, ArrayVec, TinyVec?
    • Add MatrixGenotype which stores a population x genes_size matrix and keeps the genes internal to the Genotype. Chromosomes just reference the ID using reference-id. Population can be handled very lightly as the Chromosome is just a light pointer. Use nalgebra (2D)? Make sure to allow a dealloc call for chromosomes which are dropped from the population (send the population, and crosscheck reference-ids?)
  • Maybe use TinyVec for Population? (it us usually less than 1000 anyway), maybe useful paired with MatrixGenotype, where the chromosomes are lightweight (and Copyable)

ISSUES

  • permutate (and possibly others) with gene_size 0 panics. Maybe it should just return a empty chromosome?

Dependencies

~4.5MB
~82K SLoC