10 unstable releases (3 breaking)

0.4.1	Mar 6, 2025
0.4.0	Feb 9, 2025
0.3.4	Oct 10, 2024
0.3.3	Aug 21, 2024
0.1.0	Dec 16, 2023

#756 in Machine learning

199 downloads per month
Used in 3 crates (via kalosm-language)

MIT/Apache

570KB
13K SLoC

RLlama

RLlama is a Rust implementation of the quantized Llama 7B language model.

Llama 7B is a very small but performant language model that can be easily run on your local machine.

This library uses Candle to run Llama.

Usage

use kalosm_llama::prelude::*;

#[tokio::main]
async fn main() {
    let mut model = Llama::new().await.unwrap();
    let prompt = "The capital of France is ";
    let mut stream = model(prompt);

    print!("{prompt}");
    while let Some(token) = stream.next().await {
        print!("{token}");
    }
}

Floneum

Floneum makes it easy to develop applications that use local pre-trained AI models. There are two main projects in this monorepo:

Kalosm: A simple interface for pre-trained models in rust
Floneum Editor (preview): A graphical editor for local AI workflows. See the user documentation or plugin documentation for more information.

Kalosm

Kalosm is a simple interface for pre-trained models in Rust that backs Floneum. It makes it easy to interact with pre-trained, language, audio, and image models.

Model Support

Kalosm supports a variety of models. Here is a list of the models that are currently supported:

Model	Modality	Size	Description	Quantized	CUDA + Metal Accelerated	Example
Llama	Text	1b-70b	General purpose language model	✅	✅	llama 3 chat
Mistral	Text	7-13b	General purpose language model	✅	✅	mistral chat
Phi	Text	2b-4b	Small reasoning focused language model	✅	✅	phi 3 chat
Whisper	Audio	20MB-1GB	Audio transcription model	✅	✅	live whisper transcription
RWuerstchen	Image	5gb	Image generation model	❌	✅	rwuerstchen image generation
TrOcr	Image	3gb	Optical character recognition model	❌	✅	Text Recognition
Segment Anything	Image	50MB-400MB	Image segmentation model	❌	❌	Image Segmentation
Bert	Text	100MB-1GB	Text embedding model	❌	✅	Semantic Search

Utilities

Kalosm also supports a variety of utilities around pre-trained models. These include:

Performance

Kalosm uses the candle machine learning library to run models in pure rust. It supports quantized and accelerated models with performance on par with llama.cpp:

Mistral 7b

Accelerator	Kalosm	llama.cpp
Metal (M2)	39 t/s	27 t/s

Structured Generation

Kalosm supports structured generation with arbitrary parsers. It uses a custom parser engine and sampler and structure-aware acceleration to make structure generation even faster than uncontrolled text generation. You can take any rust type and add #[derive(Parse, Schema)] to make it usable with structured generation:

use kalosm::language::*;

/// A fictional character
#[derive(Parse, Schema, Clone, Debug)]
struct Character {
    /// The name of the character
    #[parse(pattern = "[A-Z][a-z]{2,10} [A-Z][a-z]{2,10}")]
    name: String,
    /// The age of the character
    #[parse(range = 1..=100)]
    age: u8,
    /// A description of the character
    #[parse(pattern = "[A-Za-z ]{40,200}")]
    description: String,
}

#[tokio::main]
async fn main() {
    // First create a model. Chat models tend to work best with structured generation
    let model = Llama::phi_3().await.unwrap();
    // Then create a task with the parser as constraints
    let task = model.task("You generate realistic JSON placeholders for characters")
        .typed();
    // Finally, run the task
    let mut stream = task("Create a list of random characters", &model);
    stream.to_std_out().await.unwrap();
    let characters: [Character; 10] = stream.await.unwrap();
    println!("{characters:?}");
}

https://github.com/user-attachments/assets/8900f57d-55c8-4d4a-a67b-73beab1e5155

In addition to regex, you can provide your own grammar to generate structured data. This lets you constrain the response to any structure you want including complex data structures like JSON, HTML, and XML.

Kalosm Quickstart!

This quickstart will get you up and running with a simple chatbot. Let's get started!

A more complete guide for Kalosm is available on the Kalosm website, and examples are available in the examples folder.

Install rust
Create a new project:

cargo new kalosm-hello-world
cd ./kalosm-hello-world

Add Kalosm as a dependency

# You can use `--features language,metal`, `--features language,cuda`, or `--features language,mkl` if your machine supports an accelerator
cargo add kalosm --features language
cargo add tokio --features full

Add this code to your main.rs file

use kalosm::language::*;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
  let model = Llama::phi_3().await?;
  let mut chat = model.chat()
    .with_system_prompt("You are a pirate called Blackbeard");

  loop {
    chat(&prompt_input("\n> ")?)
      .to_std_out()
      .await?;
  }
}

Run your application with:

cargo run --release

chat bot demo

Community

If you are interested in either project, you can join the discord to discuss the project and get help.

Contributing

Report issues on our issue tracker.
Help other users in the discord
If you are interested in contributing, feel free to reach out on discord

Dependencies

~33–49MB
~885K SLoC