6 releases

0.8.1 Apr 2, 2026
0.8.0 Apr 2, 2026
0.7.0 Jan 28, 2026
0.7.0-alpha.4 Jan 27, 2026

#1046 in Machine learning

Download history 135/week @ 2026-01-24 348/week @ 2026-01-31 404/week @ 2026-02-07 602/week @ 2026-02-14 1159/week @ 2026-02-21 1647/week @ 2026-02-28 2619/week @ 2026-03-07 2861/week @ 2026-03-14 2671/week @ 2026-03-21 1887/week @ 2026-03-28 2379/week @ 2026-04-04 1539/week @ 2026-04-11

9,074 downloads per month
Used in 26 crates (via mistralrs-core)

MIT license

2MB
42K SLoC

Rust 28K SLoC // 0.0% comments CUDA 7K SLoC // 0.1% comments Metal Shading Language 7K SLoC // 0.1% comments

mistralrs-quant

An advanced and highly diverse set of quantization techniques. This crate supports both quantization and optimized inference.

It has grown beyon simply quantization and is used by mistral.rs to power:

  • ISQ
  • Imatrix collection
  • General quantization features
  • Specific CUDA and Metal features
  • cuBLASlt integration

Currently supported:

  • AFQ: GgufMatMul (2-8 bit quantization optimized for Metal and compatible with MLX)
  • GGUF: GgufMatMul (2-8 bit quantization, with imatrix)
  • Gptq/Awq: GptqAwqLayer (with CUDA marlin kernel)
  • Hqq: HqqLayer (4, 8 bit quantization)
  • FP8: FP8Linear
  • F8Q8: F8Q8Linear
  • Unquantized (used for ISQ): UnquantLinear
  • Bnb: BnbLinear (int8, fp4, nf4)

Some kernels are copied or based on implementations in:

Dependencies

~26–53MB
~763K SLoC