2 releases

0.1.1	Sep 13, 2023
0.1.0	Sep 10, 2023

#820 in Rust patterns

3,581 downloads per month

Apache-2.0 WITH LLVM-exception

28KB
424 lines

This is IEEE APSqRt, a square root implementation for use with rustc_apfloat's IEEE 754 floats.

(APFloat is short for Arbitrary Precision Floating Point. IEEE APSqRt is short for IEEE 754 APFloat Square Root. Unlike APFloat, it is only designed to work with IEEE 754 single-, double-, and quad-precision floats; thus, it does not provide arbitrary precision itself.)

Why

rustc_apfloat is a software floating point library. It's a Rust conversion of "APFloat" from the LLVM project. These libraries emphasize clarity and correctness over speed, but are still pretty fast. They're used, respectively, in the rustc and clang compilers to do compile-time calculation of floating point operations. These compilers don't use the host's floating point because, when cross compiling, host and target behavior may be difficult or impossible to reconcile. Emulators often have the same problem. I was writing a deterministic RV32-G emulator, and as part of that, I needed to emulate the RISC-V standard's floating point specifications specifically. rustc_apfloat to the rescue!

Because rustc_apfloat implements almost every operation needed for RISC-V floating point emulation, it made my life a lot easier—but that last operation, square root, is important, so I had to implement it myself. It would have been faster and easier to use the host's sqrt functions for these, and to accept the inconsistencies in emulation, but my emulator is meant to fit in a multiplayer game's logic loop (?!!?!) so it has to be deterministic. A bad algorithm that is wrong in exactly the same ways on any platform is, therefore, better than a good algorithm that has even the slightest disagreement between platforms.

I originally implemented a very naive guess and check algorithm (see Bad APSqRt), but once I'd done that, I had everything I needed to implement and understand Newton-Raphson instead. This crate is the result.

Algorithm

IEEE APSqRt implements the Newton-Raphson method, a.k.a. the Babylonian method. It forms an initial guess of 2^(exponent/2), and repeatedly refines it using the formula: new_guess = (old_guess + square / old_guess) / 2. After just 5-7 iterations, this gets as close as it's going to get to an answer.

Accuracy

If there is an exact solution, either form of IEEE APSqRt always finds it. For sqrt_fast, roughly 70% of inexact solutions will be correct, roughly 30% of solutions will be off by one ulp, and a tiny percentage of solutions will be off by two ulps. For sqrt_slow, which performs the operations at higher precision, it will usually take only one more iteration and the answer will be 100% correct. (sqrt_slow is, unfortunately, due to limitations of rustc_apfloat's external interface, only available for 32- and 64-bit floats.)

All NaNs produced by IEEE APSqRt are "canon NaNs" according to the RISC-V standard.

How to use

Call ieee_apsqrt::sqrt_fast with a u32/u64/u128, or ieee_apsqrt::sqrt_accurate with a u32/u64. The former function is a little less accurate, but about twice as fast. The latter function is as accurate as possible, but about half as fast. Both functions also require a rounding mode.

Both functions return a tuple of (rustc_apfloat::StatusAnd<uXX>, u32), where the first value is the result of the calculation, and the second value is how many iterations of Newton-Raphson were required.

If you're counting clock cycles for emulation purposes, consider the square root to have a base cost of a single multiply, and an iteration cost of one division, one addition, and one multiply. Consider sqrt_fast to be performing operations at the requested precision, and sqrt_accurate at twice that precision (e.g. single -> double, double -> quad).

Legalese

IEEE APSqRt is licensed under the Apache 2 with LLVM exception license. This is the same license as the rustc_apfloat crate. This is the simplest way to guarantee that IEEE APSqRt can be used anywhere anywhere rustc_apfloat can.

Dependencies

~345KB