2 releases
Uses new Rust 2024
| 0.1.0-alpha.2 | Aug 21, 2025 |
|---|---|
| 0.1.0-alpha.1 | Aug 20, 2025 |
#1013 in Math
28 downloads per month
Used in 3 crates
38KB
447 lines
lmonade-models
Core model architectures and serving components for the Lmonade inference engine.
Overview
This crate provides:
- Model architectures (currently TinyLlama)
- Tensor operations and components (attention, feedforward, normalization)
- Serving infrastructure (paged KV cache, block management)
- Weight loading from SafeTensors and GGUF formats
- Batching strategies for inference
Key Components
- Models: Architecture implementations (
src/models/) - Components: Building blocks like attention and feedforward layers (
src/components/) - Formats: Weight loading and model configuration (
src/formats/) - Serving: Production serving infrastructure (
src/serving/)- Paged attention and KV cache management
- Continuous batching for throughput optimization
- Memory block management
Usage
use lmonade_models::models::tinyllama::TinyLlamaModel;
use lmonade_models::formats::config::ModelConfig;
// Load model configuration
let config = ModelConfig::from_file("path/to/config.json")?;
// Initialize model
let model = TinyLlamaModel::new(&config)?;
Documentation
For detailed API documentation and architectural details, see:
Status
This crate is under active development. TinyLlama inference is partially working with ongoing optimizations for performance and accuracy.
Dependencies
~43MB
~834K SLoC