1 stable release
1.1.0 | May 5, 2021 |
---|
#2063 in Algorithms
24KB
393 lines
LibIsolationForest
Description
This project contains Rust, C++, Julia, and python implementations of the Isolation Forest algorithm. Isolation Forest is an anomaly detection algorithm based around a collection of randomly generated decision trees. For a full description of the algorithm, consult the original paper by the algorithm's creators:
https://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/icdm08b.pdf
Python Example
The python implementation can be installed via pip:
pip install IsolationForest
This is a short code snipet that shows how to use the Python version of the library. You can also read the file test.py
for a complete example. As the library matures, I'll add more test examples to this file.
from isolationforest import IsolationForest
forest = IsolationForest.Forest(num_trees, sub_sampling_size)
sample = IsolationForest.Sample("Training Sample 1")
features = []
features.append({"feature 1": feature_1_value})
# Add more features to the sample...
features.append({"feature N": feature_N_value})
sample.add_features(features)
# Add the features to the sample.
forest.add_sample(sample)
# Add more samples to the forest...
# Create the forest.
forest.create()
sample = IsolationForest.Sample("Test Sample 1")
features = []
features.append({"feature 1": feature_1_value})
# Add more features to the sample...
features.append({"feature N": feature_N_value})
# Add the features to the sample.
sample.add_features(features)
# Score the sample.
score = forest.score(sample)
normalized_score = forest.normalized_score(sample)
Rust Example
More examples of how to use the Rust version of the library can be found in lib.rs
. As the library matures, I'll add more test examples to this file.
let file_path = "../data/iris.data.txt";
let file = match std::fs::File::open(&file_path) {
Err(why) => panic!("Couldn't open {} {}", file_path, why),
Ok(file) => file,
};
let mut reader = csv::Reader::from_reader(file);
let mut forest = crate::isolation_forest::Forest::new(10, 10);
let training_class_name = "Iris-setosa";
let mut training_samples = Vec::new();
let mut test_samples = Vec::new();
let mut avg_control_set_score = 0.0;
let mut avg_outlier_set_score = 0.0;
let mut avg_control_set_normalized_score = 0.0;
let mut avg_outlier_set_normalized_score = 0.0;
let mut num_control_tests = 0;
let mut num_outlier_tests = 0;
let mut rng = rand::thread_rng();
let range = Uniform::from(0..10);
for record in reader.records() {
let record = record.unwrap();
let sepal_length_cm: f64 = record[0].parse().unwrap();
let sepal_width_cm: f64 = record[1].parse().unwrap();
let petal_length_cm: f64 = record[2].parse().unwrap();
let petal_width_cm: f64 = record[3].parse().unwrap();
let name: String = record[4].parse().unwrap();
let mut features = crate::isolation_forest::FeatureList::new();
features.push(crate::isolation_forest::Feature::new("sepal length in cm", (sepal_length_cm * 10.0) as u64));
features.push(crate::isolation_forest::Feature::new("sepal width in cm", (sepal_width_cm * 10.0) as u64));
features.push(crate::isolation_forest::Feature::new("petal length in cm", (petal_length_cm * 10.0) as u64));
features.push(crate::isolation_forest::Feature::new("petal width in cm", (petal_width_cm * 10.0) as u64));
let mut sample = crate::isolation_forest::Sample::new(&name);
sample.add_features(&mut features);
// Randomly split the samples into training and test samples.
let x = range.sample(&mut rng) as u64;
if x > 5 && name == training_class_name {
forest.add_sample(sample.clone());
training_samples.push(sample);
}
else {
test_samples.push(sample);
}
}
// Create the forest.
forest.create();
// Use each test sample.
for test_sample in test_samples {
let score = forest.score(&test_sample);
let normalized_score = forest.normalized_score(&test_sample);
if training_class_name == test_sample.name {
avg_control_set_score = avg_control_set_score + score;
avg_control_set_normalized_score = avg_control_set_normalized_score + normalized_score;
num_control_tests = num_control_tests + 1;
}
else {
avg_outlier_set_score = avg_outlier_set_score + score;
avg_outlier_set_normalized_score = avg_outlier_set_normalized_score + normalized_score;
num_outlier_tests = num_outlier_tests + 1;
}
}
// Compute statistics.
if num_control_tests > 0 {
avg_control_set_score = avg_control_set_score / num_control_tests as f64;
avg_control_set_normalized_score = avg_control_set_normalized_score / num_control_tests as f64;
}
if num_outlier_tests > 0 {
avg_outlier_set_score = avg_outlier_set_score / num_outlier_tests as f64;
avg_outlier_set_normalized_score = avg_outlier_set_normalized_score / num_outlier_tests as f64;
}
println!("Avg Control Score: {}", avg_control_set_score);
println!("Avg Control Normalized Score: {}", avg_control_set_normalized_score);
println!("Avg Outlier Score: {}", avg_outlier_set_score);
println!("Avg Outlier Normalized Score: {}", avg_outlier_set_normalized_score);
C++ Example
An example of how to use the C++ version of the library can be found in main.cpp
. As the library matures, I'll add more test examples to this file.
Julia Example
An example of how to use the Julia version of the library can be found in test.jl
. As the library matures, I'll add more test examples to this file.
Version History
1.0
- Initial version.
1.1
- Added normalized scores.
- Updated random number generation in rust, because it changed again.
License
This library is released under the MIT license, see LICENSE for details.
Dependencies
~2–3MB
~48K SLoC