9 releases
0.3.3 | Aug 28, 2024 |
---|---|
0.3.2 | Aug 14, 2024 |
0.2.3 | Jun 30, 2024 |
0.2.2 | Feb 28, 2024 |
0.1.1 | Dec 21, 2023 |
#326 in Machine learning
216 downloads per month
Used in 4 crates
(via kalosm)
675KB
15K
SLoC
Kalosm Language
Language processing utilities for the Kalosm framework.
The language part of Kalosm has a few core parts:
- Models: Text generation and embedding models
- Context: Document collection, format support, search and chunking
- Integrations: SurrealDB, Serper, and other integrations
Text Generation Models
Model
and ModelExt
are the core traits for text generation models. Any model that implements these traits can be used with Kalosm.
The simplest way to use a model is to create a llama model and call stream_text on it:
use kalosm::language::*;
#[tokio::main]
async fn main() {
let mut llm = Llama::new().await.unwrap();
let prompt = "The following is a 300 word essay about why the capital of France is Paris:";
print!("{prompt}");
let mut stream = llm
// Any model that implements the Model trait can be used to stream text
.stream_text(prompt)
// You can pass parameters to the model to control the output
.with_max_length(300)
// And run .await to start streaming
.await
.unwrap();
// You can then use the stream however you need. to_std_out will print the text to the console as it is generated
stream.to_std_out().await.unwrap();
}
Tasks
You can define a Task with a description and then run it with an input. The task will cache the description to repeated calls faster. Tasks work with both chat and non-chat models, but they tend to perform significantly better with chat models.
use kalosm::language::*;
#[tokio::main]
async fn main() {
// Create a new model
let model = Llama::new_chat().await.unwrap();
// Create a new task that summarizes text
let task = Task::new("You take a long description and summarize it into a single short sentence");
let mut output = task.run("You can define a Task with a description then run it with an input. The task will cache the description to repeated calls faster. Tasks work with both chat and non-chat models, but they tend to perform significantly better with chat models.", &model);
// Then stream the output to the console
output.to_std_out().await.unwrap();
}
Structured Generation
Structured generation gives you more control over the output of the text generation. You can derive a parser for your data to easily get structured data out of an LLM:
use kalosm::language::*;
#[derive(Parse, Clone)]
struct Pet {
name: String,
age: u32,
description: String,
}
Then you can generate text that works with the parser in a Task
:
use kalosm::language::*;
#[derive(Parse, Debug, Clone)]
struct Pet {
name: String,
age: u32,
description: String,
}
#[tokio::main]
async fn main() {
// First create a model. Chat models tend to work best with structured generation
let model = Llama::new_chat().await.unwrap();
// Then create a parser for your data. Any type that implements the `Parse` trait has the `new_parser` method
let parser = Pet::new_parser();
// Then create a task with the parser as constraints
let task = Task::builder("You generate realistic JSON placeholders")
.with_constraints(parser)
.build();
// Finally, run the task
let pet: Pet = task.run("Generate a pet in the form {\"name\": \"Pet name\", \"age\": 0, \"description\": \"Pet description\"}", &model).await.unwrap();
println!("{pet:?}");
}
Embedding Models
Embedder
and EmbedderExt
are the core traits for text embedding models. Any model that implements these traits can be used with Kalosm.
The simplest way to use an embedding model is to create a bert model and call embed
on it. The Embedding
you get back represents the meaning of the text in a numerical format:
use kalosm::language::*;
#[tokio::main]
async fn main() {
// First create a model. Bert::new() is a good default embedding model for general tasks
let model = Bert::new().await.unwrap();
// Then embed some text into the vector space
let embedding = model.embed("Kalosm is a library for building AI applications").await.unwrap();
// And some more text
let embedding = model.embed(prompt_input("Text: ").unwrap()).await.unwrap();
// You can compare the cosine similarity of the two embeddings to see how similar they are
println!("cosine similarity: {}", embedding.cosine_similarity(&embedding));
}
Context
Gathering context is a key part of building LLM applications. Providing the right context to the model makes the output more relevant and useful. It can also help to prevent hallucinations.
Kalosm provides tools to generate gather, and process context from a variety of sources.
Gathering context
Kalosm provides utilities for collecting context from a variety of sources:
- Local files (.txt, .md, .html, .docx, .pdf)
- RSS feeds
- Websites
- Search engines
- Microphone input and audio input through whisper transcriptions
Each of these sources implements either IntoDocument
or IntoDocuments
to convert the data into a Document
with the contents and metadata about the document.
use kalosm::language::*;
use std::convert::TryFrom;
use std::path::PathBuf;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Try to extract an article from a URL
let page = Url::parse("https://www.nytimes.com/live/2023/09/21/world/zelensky-russia-ukraine-news")?;
let document = page.into_document().await?;
println!("Title: {}", document.title());
println!("Body: {}", document.body());
Ok(())
}
Chunking context
After you have gathered context, it is often useful to chunk it into smaller pieces for search. Kalosm provides utilities for chunking context into documents, sentences, paragraphs, or semantic chunks. Kalosm will embed each chunk as it splits the document into smaller pieces. One of the most powerful chunker is the semantic chunker, which lets you chunk documents into semantically similar chunks without explicitly setting the size of the chunks:
use kalosm::language::*;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// First, create an embedding model for semantic chunking
let model = Bert::new().await?;
// Then create a document folder with some documents
let documents = DocumentFolder::new("./documents")?.into_documents().await?;
// Then chunk the documents into sentences
let chunked = SemanticChunker::new().chunk_batch(&documents, &model).await?;
println!("{:?}", chunked);
Ok(())
}
Embedding-powered search
After you have chunked your context, you can use the embeddings for search or retrieval augmented generation. Embedding-based search lets you find documents that are semantically similar to a specific word or phrase even if no words are an exact match:
use kalosm::language::*;
use surrealdb::{engine::local::RocksDb, Surreal};
#[tokio::main]
async fn main() {
// Create database connection
let db = Surreal::new::<RocksDb>(std::env::temp_dir().join("temp.db")).await.unwrap();
// Select a specific namespace / database
db.use_ns("search").use_db("documents").await.unwrap();
// Create a table in the surreal database to store the embeddings
let document_table = db
.document_table_builder("documents")
.build::<Document>()
.await
.unwrap();
// Add documents to the database
document_table.add_context(DocumentFolder::new("./documents").unwrap()).await.unwrap();
loop {
// Get the user's question
let user_question = prompt_input("Query: ").unwrap();
let nearest_5 = document_table
.select_nearest(user_question, 5)
.await
.unwrap();
println!("{:?}", nearest_5);
}
}
Dependencies
~53–74MB
~1.5M SLoC