#srgb #simd #color-conversion #graphics #convert #rgb #simd-accelerated

no-std fast-srgb8

Very fast conversions between linear float and 8-bit sRGB (with no_std support)

2 releases (1 stable)

1.0.0 May 9, 2021
0.1.0 May 6, 2021

#43 in Rendering

Download history 19480/week @ 2024-09-16 21283/week @ 2024-09-23 21023/week @ 2024-09-30 19630/week @ 2024-10-07 20187/week @ 2024-10-14 17074/week @ 2024-10-21 19065/week @ 2024-10-28 18387/week @ 2024-11-04 19079/week @ 2024-11-11 24135/week @ 2024-11-18 18489/week @ 2024-11-25 20374/week @ 2024-12-02 18595/week @ 2024-12-09 21642/week @ 2024-12-16 18124/week @ 2024-12-23 16639/week @ 2024-12-30

76,660 downloads per month
Used in 321 crates (6 directly)

MIT OR Apache-2.0 OR CC0-1.0

28KB
344 lines

fast-srgb8

Build Status Docs Latest Version Minimum Rust Version

Small crate implementing fast conversion between linear float and 8-bit sRGB. Includes API for performing 4 simultaneous conversions, which are SIMD accelerated using SSE2 if available. Supports no_std (doesn't need libm either).

Features

  • f32_to_srgb8: converting a linear f32 to sRGB u8. Compliant with the most relevent public spec for this conversion (correct to ULP of 0.6, monotonic over range, etc)
  • f32x4_to_srgb8: Produces results identical to calling f32_to_srgb8 4 times in a row, but uses SSE2 to SIMD accelerate on x86 and x86_64 where SSE2 is known to be present. Otherwise, it just returns the results of calling f32_to_srgb8 (the scalar equivalent) 4 times.
  • srgb8_to_f32: Inverse operation of f32_to_srgb8. Uses the standard technique of a 256-item lookup table.

Benefits

  • Huge performance improvments over the naive implementation — ~5x for conversion to f32->srgb8, ~20x for srgb8->f32.
  • Supports no_std — normally this is tricky, as these operations require powf naively, which is not available to libcore.
  • No dependencies.
  • SIMD support for conversion to sRGB (conversion from sRGB is already ~20x faster than naive impl, and would probably be slower in SIMD, so for now it's not implemented).
  • Consistent and correct (according to at least one relevant spec) handling of edge cases, such as NaN/Inf/etc.
  • Exhaustive checking of all inputs for correctness (in tests).

Benchmarks

# Measures `fast_srgb8::f32_to_srgb8` vs ref impl
test tests::bench::fast_scalar       ... bench:         144 ns/iter (+/- 11)
test tests::bench::naive_scalar      ... bench:         971 ns/iter (+/- 48)

# Measures `fast_srgb8::f32x4_to_srgb8` vs calling reference impl 4 times
test tests::bench::fast_f32x4        ... bench:         440 ns/iter (+/- 29)
test tests::bench::naive_f32x4       ... bench:       3,625 ns/iter (+/- 282)
test tests::bench::fast_f32x4_nosimd ... bench:         482 ns/iter (+/- 27)

# Measures `fast_srgb8::srgb8_to_f32` vs ref impl
test tests::bench::fast_from_srgb8   ... bench:          81 ns/iter (+/- 6)
test tests::bench::naive_from_srgb8  ... bench:       4,026 ns/iter (+/- 282)

(Note that the ns/iter time is not for a single invocation of these function, it's for several)

License

Public domain, as explained here. If that's unacceptable, it's also available under either the Apache-2.0 or MIT licenses, at your option.

The float->srgb code is originally¹ based on public domain routines by Fabien "ryg" Giesen, although I'm no longer sure where these are available.

¹ (Well, specifically: The Rust code in this crate is ported from code in a C++ game engine of mine, which in turn, was based on the code from ryg. This doesn't make a difference, but increases the likelihood that any errors are solely my responsibility).

No runtime deps