#llama #cpp-bindings #bindings #llm #llama-cpp #generate

rs-llama-cpp

Automated Rust bindings generation for LLaMA.cpp

60 releases

0.1.67 Aug 9, 2023
0.1.66 Aug 8, 2023
0.1.58 Jul 31, 2023
0.1.28 Jun 30, 2023

#352 in Science

MIT license

2MB
44K SLoC

C 18K SLoC // 0.1% comments C++ 15K SLoC // 0.2% comments CUDA 4.5K SLoC // 0.0% comments Python 2K SLoC // 0.1% comments Metal Shading Language 1.5K SLoC // 0.0% comments Shell 1K SLoC // 0.1% comments Objective-C 899 SLoC // 0.1% comments Rust 727 SLoC // 0.0% comments JavaScript 262 SLoC // 0.2% comments Vim Script 131 SLoC // 0.1% comments Zig 74 SLoC // 0.0% comments Batch 48 SLoC Swift 21 SLoC

rs-llama-cpp

Automated Rust bindings generation for LLaMA.cpp

Description

LLaMA.cpp is under heavy development with contributions pouring in from numerous individuals every day. Currently, its C API is very low-level and given how fast the project is evolving, keeping up with the changes and porting the examples into a higher-level API prove to be difficult. As a trade-off, this project prioritizes automation over flexibility by automatically generating Rust bindings for the main example of LLaMA.cpp.

Limitations

The main design goal of this project is to minimize the effort of updating LLaMA.cpp by automating as many steps as possible. However, this approach does have some limitations:

  1. The API is very high-level, resembling a call to the main function of LLaMA.cpp and receiving tokens through a callback function.
  2. Currently, the project does not expose parameters with types that are more challenging to convert to Rust, such as std::unordered_map and std::vector.
  3. Some of the parameters exposed via the Rust API are only relevant for a CLI.
  4. The generated C++ library outputs a significant amount of debug information to stderr and stdout. This is not configurable at the moment

Usage

use rs_llama_cpp::{gpt_params_c, run_inference, str_to_mut_i8};

fn main() {
    let params: gpt_params_c = {
        gpt_params_c {
            model: str_to_mut_i8("/path/to/model.bin"),
            prompt: str_to_mut_i8("Hello "),
            ..Default::default()
        }
    };

    run_inference(params, |token| {
        println!("Token: {}", token);

        if token.ends_with("\n") {
            return false; // stop inference
        }

        return true; // continue inference
    });
}

No runtime deps