9 releases

0.3.3 Aug 28, 2024
0.3.2 Aug 14, 2024
0.2.3 Jun 30, 2024
0.2.2 Feb 28, 2024
0.1.1 Dec 21, 2023

#408 in Machine learning

Download history 66/week @ 2024-10-06 88/week @ 2024-10-13 38/week @ 2024-10-20 64/week @ 2024-10-27 49/week @ 2024-11-03 34/week @ 2024-11-10 55/week @ 2024-11-17 47/week @ 2024-11-24 48/week @ 2024-12-01 122/week @ 2024-12-08 69/week @ 2024-12-15 27/week @ 2024-12-22 41/week @ 2024-12-29 147/week @ 2025-01-05 118/week @ 2025-01-12 46/week @ 2025-01-19

356 downloads per month
Used in 4 crates (via kalosm)

MIT/Apache

675KB
15K SLoC

Kalosm Language

Language processing utilities for the Kalosm framework.

The language part of Kalosm has a few core parts:

Text Generation Models

Model and ModelExt are the core traits for text generation models. Any model that implements these traits can be used with Kalosm.

The simplest way to use a model is to create a llama model and call stream_text on it:

use kalosm::language::*;
#[tokio::main]
async fn main() {
    let mut llm = Llama::new().await.unwrap();
    let prompt = "The following is a 300 word essay about why the capital of France is Paris:";
    print!("{prompt}");
    let mut stream = llm
        // Any model that implements the Model trait can be used to stream text
        .stream_text(prompt)
        // You can pass parameters to the model to control the output
        .with_max_length(300)
        // And run .await to start streaming
        .await
        .unwrap();
    // You can then use the stream however you need. to_std_out will print the text to the console as it is generated
    stream.to_std_out().await.unwrap();
}

Tasks

You can define a Task with a description and then run it with an input. The task will cache the description to repeated calls faster. Tasks work with both chat and non-chat models, but they tend to perform significantly better with chat models.

use kalosm::language::*;
#[tokio::main]
async fn main() {
    // Create a new model
    let model = Llama::new_chat().await.unwrap();
    // Create a new task that summarizes text
    let task = Task::new("You take a long description and summarize it into a single short sentence");
    let mut output = task.run("You can define a Task with a description then run it with an input. The task will cache the description to repeated calls faster. Tasks work with both chat and non-chat models, but they tend to perform significantly better with chat models.", &model);
    // Then stream the output to the console
    output.to_std_out().await.unwrap();
}

Structured Generation

Structured generation gives you more control over the output of the text generation. You can derive a parser for your data to easily get structured data out of an LLM:

use kalosm::language::*;
#[derive(Parse, Clone)]
struct Pet {
    name: String,
    age: u32,
    description: String,
}

Then you can generate text that works with the parser in a Task:

use kalosm::language::*;
#[derive(Parse, Debug, Clone)]
struct Pet {
    name: String,
    age: u32,
    description: String,
}

#[tokio::main]
async fn main() {
    // First create a model. Chat models tend to work best with structured generation
    let model = Llama::new_chat().await.unwrap();
    // Then create a parser for your data. Any type that implements the `Parse` trait has the `new_parser` method
    let parser = Pet::new_parser();
    // Then create a task with the parser as constraints
    let task = Task::builder("You generate realistic JSON placeholders")
        .with_constraints(parser)
        .build();
    // Finally, run the task
    let pet: Pet = task.run("Generate a pet in the form {\"name\": \"Pet name\", \"age\": 0, \"description\": \"Pet description\"}", &model).await.unwrap();
    println!("{pet:?}");
}

Embedding Models

Embedder and EmbedderExt are the core traits for text embedding models. Any model that implements these traits can be used with Kalosm.

The simplest way to use an embedding model is to create a bert model and call embed on it. The Embedding you get back represents the meaning of the text in a numerical format:

use kalosm::language::*;
#[tokio::main]
async fn main() {
    // First create a model. Bert::new() is a good default embedding model for general tasks
    let model = Bert::new().await.unwrap();
    // Then embed some text into the vector space
    let embedding = model.embed("Kalosm is a library for building AI applications").await.unwrap();
    // And some more text
    let embedding = model.embed(prompt_input("Text: ").unwrap()).await.unwrap();
    // You can compare the cosine similarity of the two embeddings to see how similar they are
    println!("cosine similarity: {}", embedding.cosine_similarity(&embedding));
}

Context

Gathering context is a key part of building LLM applications. Providing the right context to the model makes the output more relevant and useful. It can also help to prevent hallucinations.

Kalosm provides tools to generate gather, and process context from a variety of sources.

Gathering context

Kalosm provides utilities for collecting context from a variety of sources:

  • Local files (.txt, .md, .html, .docx, .pdf)
  • RSS feeds
  • Websites
  • Search engines
  • Microphone input and audio input through whisper transcriptions

Each of these sources implements either IntoDocument or IntoDocuments to convert the data into a Document with the contents and metadata about the document.

use kalosm::language::*;
use std::convert::TryFrom;
use std::path::PathBuf;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Try to extract an article from a URL
    let page = Url::parse("https://www.nytimes.com/live/2023/09/21/world/zelensky-russia-ukraine-news")?;
    let document = page.into_document().await?;
    println!("Title: {}", document.title());
    println!("Body: {}", document.body());

    Ok(())
}

Chunking context

After you have gathered context, it is often useful to chunk it into smaller pieces for search. Kalosm provides utilities for chunking context into documents, sentences, paragraphs, or semantic chunks. Kalosm will embed each chunk as it splits the document into smaller pieces. One of the most powerful chunker is the semantic chunker, which lets you chunk documents into semantically similar chunks without explicitly setting the size of the chunks:

use kalosm::language::*;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // First, create an embedding model for semantic chunking
    let model = Bert::new().await?;
    // Then create a document folder with some documents
    let documents = DocumentFolder::new("./documents")?.into_documents().await?;
    // Then chunk the documents into sentences
    let chunked = SemanticChunker::new().chunk_batch(&documents, &model).await?;
    println!("{:?}", chunked);
    Ok(())
}

After you have chunked your context, you can use the embeddings for search or retrieval augmented generation. Embedding-based search lets you find documents that are semantically similar to a specific word or phrase even if no words are an exact match:

use kalosm::language::*;
use surrealdb::{engine::local::RocksDb, Surreal};

#[tokio::main]
async fn main() {
    // Create database connection
    let db = Surreal::new::<RocksDb>(std::env::temp_dir().join("temp.db")).await.unwrap();

    // Select a specific namespace / database
    db.use_ns("search").use_db("documents").await.unwrap();

    // Create a table in the surreal database to store the embeddings
    let document_table = db
        .document_table_builder("documents")
        .build::<Document>()
        .await
        .unwrap();

    // Add documents to the database
    document_table.add_context(DocumentFolder::new("./documents").unwrap()).await.unwrap();

    loop {
        // Get the user's question
        let user_question = prompt_input("Query: ").unwrap();

        let nearest_5 = document_table
            .select_nearest(user_question, 5)
            .await
            .unwrap();

        println!("{:?}", nearest_5);
    }
}

Dependencies

~53–75MB
~1.5M SLoC