6 releases
0.1.5 | Dec 8, 2024 |
---|---|
0.1.4 | Dec 7, 2024 |
#78 in Machine learning
573 downloads per month
29KB
510 lines
hallucination-detection
A high-performance Rust library for detecting hallucinations in Large Language Model (LLM) outputs using BERT Named Entity Recognition (NER), proper noun analysis, and numerical comparisons.
Features
- Fast and accurate hallucination detection for RAG (Retrieval-Augmented Generation) systems
- Numerical comparison and validation
- Unknown word detection using comprehensive English word dictionary
- Configurable scoring weights and detection options
- Async/await support with Tokio runtime
- Optional ONNX support for improved performance
- Optional BERT-based Named Entity Recognition for proper noun analysis
Installation
Add this to your Cargo.toml
:
[dependencies]
hallucination-detection = "^0.1.3"
If you want to use NER:
- Download
libtorch
from https://pytorch.org/get-started/locally/. This package requiresv2.4
: if this version is no longer available on the "get started" page, the file should be accessible by modifying the target link, for examplehttps://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.4.0%2Bcpu.zip
for a Linux version with CPU. - Extract the library to a location of your choice
- Set the following environment variables
Linux:
export LIBTORCH=/path/to/libtorch
export LD_LIBRARY_PATH=${LIBTORCH}/lib:$LD_LIBRARY_PATH
Windows
$Env:LIBTORCH = "X:\path\to\libtorch"
$Env:Path += ";X:\path\to\libtorch\lib"
[dependencies]
hallucination-detection = { version = "^0.1.3", features = ["ner"] }
If you want to use ONNX for the NER models, you need to either install the ort runtime or include it in your dependencies:
hallucination-detection = { version = "^0.1.3", features = ["ner", "onnx"] }
ort = { version = "...", features = [ "download-binaries" ] }
Quick Start
use hallucination_detection::{HallucinationDetector, HallucinationOptions};
#[tokio::main]
async fn main() {
// Create detector with default options
let detector = HallucinationDetector::new(Default::default())
.expect("Failed to create detector");
// Example texts
let llm_output = String::from("Tesla sold 500,000 cars in Europe last quarter.");
let references = vec![
String::from("Tesla reported strong sales in European markets."),
String::from("The company's global deliveries increased.")
];
// Detect hallucinations
let score = detector.detect_hallucinations(&llm_output, &references).await;
println!("Hallucination Score: {:#?}", score);
}
Configuration
You can customize the detector's behavior using HallucinationOptions
:
use hallucination_detection::{HallucinationOptions, ScoreWeights};
let options = HallucinationOptions {
weights: ScoreWeights {
proper_noun_weight: 0.4,
unknown_word_weight: 0.1,
number_mismatch_weight: 0.5,
},
use_ner: true,
};
let detector = HallucinationDetector::new(options)
.expect("Failed to create detector");
Output
The detector returns a HallucinationScore
struct containing:
pub struct HallucinationScore {
pub proper_noun_score: f64,
pub unknown_word_score: f64,
pub number_mismatch_score: f64,
pub total_score: f64,
pub detected_hallucinations: Vec<String>,
}
- Scores range from 0.0 (no hallucination) to 1.0 (complete hallucination)
detected_hallucinations
contains specific elements that were flagged
Performance Considerations
- The NER model is loaded once and reused across predictions
- English word dictionary is cached locally for faster subsequent runs
- Async operations allow for non-blocking execution
- ONNX runtime provides optimized model inference
Features Flags
ner
: Enables BERT Named Entity Recognition (default: disabled)onnx
: Uses ONNX runtime for improved performance (default: disabled)
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Authors
Devflow Inc. humans@trieve.ai
Acknowledgments
- Uses rust-bert for NER capabilities
- English word list from dwyl/english-words
Dependencies
~8–26MB
~392K SLoC