1 unstable release
0.1.0 | Oct 12, 2024 |
---|
#8 in #ssimulacra2
Used in 2 crates
(via turbo-metrics)
150KB
3.5K
SLoC
ssimulacra2-cuda
An implementation of ssimulacra2 using CUDA.
Features
- Close to the original implementation, and with close results.
- Leverages many custom kernels written in Rust and a few CUDA NPP primitives.
- Uses CUDA graphs to alleviate the cost of launching the 200+ kernels per image pair.
TODO
- Investigate if it is possible to change some computations to accelerate processing without deviating from the original implementation too much. Maybe making it configurable.
- More custom kernels, is it possible to run the whole computation in a single fused kernel launch ?
- Use less memory (currently 500MB for 1080p), might be possible by using a single fused kernel.
Credits
Original reference implementation : https://github.com/cloudinary/ssimulacra2
With inspiration from : https://github.com/rust-av/ssimulacra2
Dependencies
~0.1–2.5MB
~49K SLoC