2 unstable releases
0.2.0 | Jan 10, 2023 |
---|---|
0.1.0 | Dec 16, 2022 |
#9 in #diffusion
165KB
3K
SLoC
pyke Diffusers is a modular Rust library for pretrained diffusion model inference to generate images, videos, or audio, using ONNX Runtime as a backend for extremely optimized generation on both CPU & GPU.
Features
- Text-to-image for Stable Diffusion v1 & v2
- Optimized for both CPU and GPU inference
- Memory-efficient pipelines to run with <2GB of RAM!
- >77 token prompts
- Prompt weighting, e.g.
a (((house:1.3)) [on] a (hill:0.5), sun, (((sky))).
- Implements DDIM, DDPM, DPM/DPM++, Euler & Euler a, LMS schedulers
Prerequisites
You'll need Rust v1.62.1+ to use pyke Diffusers.
- If using CPU: recent (no earlier than Haswell/Zen) x86-64 CPU for best results. ARM64 supported but not recommended. For acceleration, see notes for OpenVINO, oneDNN, ACL, SNPE
- If using CUDA: CUDA v11.x, cuDNN v8.2.x more info
- If using TensorRT: CUDA v11.x, TensorRT v8.4 more info
- If using ROCm: ROCm v5.2 more info
- If using DirectML: DirectX 12 compatible GPU, Windows 10 v1903+ more info
Only generic CPU, CUDA, and TensorRT have prebuilt binaries available. Other execution providers will require you to manually build them; see the ONNX Runtime docs for more info. Additionally, you'll need to make ort
link to your custom-built binaries.
LMS notes
Note: By default, the LMS scheduler is not enabled, and this section can simply be skipped.
If you plan to enable the all-schedulers
or scheduler-lms
feature, you will need to install binaries for the GNU Scientific Library. See the installation instructions for rust-GSL
to set up GSL.
Installation
[dependencies]
pyke-diffusers = "0.1"
# if you'd like to use CUDA:
pyke-diffusers = { version = "0.1", features = [ "ort-cuda" ] }
The default features enable some commonly used schedulers and pipelines.
Usage
use pyke_diffusers::{
Environment, EulerDiscreteScheduler, SchedulerOptimizedDefaults, StableDiffusionOptions, StableDiffusionPipeline,
StableDiffusionTxt2ImgOptions
};
let environment = Arc::new(Environment::builder().build()?);
let mut scheduler = EulerDiscreteScheduler::stable_diffusion_v1_optimized_default()?;
let pipeline = StableDiffusionPipeline::new(&environment, "./stable-diffusion-v1-5", &StableDiffusionOptions::default())?;
let imgs = pipeline.txt2img("photo of a red fox", &mut scheduler, &StableDiffusionTxt2ImgOptions::default())?;
imgs[0].clone().into_rgb8().save("result.png")?;
Examples
pyke-diffusers
includes an interactive Stable Diffusion demo. Run it with:
$ cargo run --example stable-diffusion-interactive --features ort-cuda -- ~/path/to/stable-diffusion/
See examples/
for more examples and the docs for more detailed information..
Converting models
pyke Diffusers currently supports Stable Diffusion v1, v2, and its derivatives.
To convert a model from a HuggingFace diffusers
model:
- Create and activate a virtual environment.
- Install Python requirements:
- install torch with CUDA:
python3 -m pip install torch --extra-index-url https://download.pytorch.org/whl/cu116
- install dependencies:
python3 -m pip install -r requirements.txt
- install torch with CUDA:
- If you are converting a model directly from HuggingFace, log in to HuggingFace Hub with
huggingface-cli login
- this can be skipped if you have the model on disk - Convert your model with
scripts/hf2pyke.py
:- To convert a float32 model from HF (recommended for CPU inference):
python3 scripts/hf2pyke.py runwayml/stable-diffusion-v1-5 ~/pyke-diffusers-sd15/
- To convert a float32 model from disk:
python3 scripts/hf2pyke.py ~/stable-diffusion-v1-5/ ~/pyke-diffusers-sd15/
- To convert a float16 model from HF (recommended for GPU inference):
python3 scripts/hf2pyke.py --fp16 runwayml/stable-diffusion-v1-5@fp16 ~/pyke-diffusers-sd15-fp16/
- To convert a float16 model from disk:
python3 scripts/hf2pyke.py --fp16 ~/stable-diffusion-v1-5-fp16/ ~/pyke-diffusers-sd15-fp16/
- To convert a float32 model from HF (recommended for CPU inference):
float16 models are faster on some GPUs and use less memory. However, it should be noted that, if you are using float16 models for GPU inference, they must be converted on the hardware they will be run on due to an ONNX Runtime bug. CPUs using float16 models should not have this issue however.
hf2pyke
supports a few options to improve performance or ORT execution provider compatibility. See python3 scripts/hf2pyke.py --help
.
ONNX Runtime binaries
When running the examples in this repo on Windows, you'll need to copy the onnxruntime*
dylibs from target/debug/
to target/debug/examples/
on first run. You'll also need to copy the dylibs to target/debug/deps/
if your project uses pyke Diffusers in a Cargo test.
CUDA and other execution providers
CUDA is the only alternative execution provider available with no setup required. Simply enable pyke Diffusers' ort-cuda
feature and enable DiffusionDevice::CUDA
; see the docs or the stable-diffusion
example for more info. You may need to rebuild your project for ort
to copy the libraries again.
For other EPs like DirectML or oneDNN, you'll need to build ONNX Runtime from source. See ort
's notes on execution providers.
Low memory usage
Lower resolution generations require less memory usage.
A StableDiffusionMemoryOptimizedPipeline
exists for environments with low memory. This pipeline removes the safety checker and will only load models when they are required and unloads them immediately after. This will heavily impact performance and should only be used in extreme cases.
Quantization
In extremely constrained environments (e.g. <= 4GB RAM), it is also possible to produce a quantized int8 model. The int8 model's quality is heavily impacted, but faster and less memory intensive on CPUs.
To convert an int8 model:
$ python3 scripts/hf2pyke.py --quantize=ut ~/stable-diffusion-v1-5/ ~/pyke-diffusers-sd15-quantized/
--quantize=ut
will quantize only the UNet and text encoder using uint8 mode for best quality and performance. You can choose to convert the other models using the following format:
- each model is assigned a letter:
u
for UNet,v
for VAE, andt
for text encoder. - a lowercase letter means the model will be quantized to uint8
- an uppercase letter means the model will be quantized to int8
Typically, uint8 is higher quality and faster, but you can play around with the settings to see if quality or speed improves.
A combination of 256x256 image generation via StableDiffusionMemoryOptimizedPipeline
with a uint8 UNet only requires 1.3 GB of memory usage.
Dependencies
~8–14MB
~262K SLoC