11 releases

0.1.10 Jan 7, 2024
0.1.9 Jan 28, 2023
0.1.1 Dec 31, 2022

#115 in Math

Apache-2.0

165KB
4K SLoC

Arbitrary-Precision Floating-Point Library  

Latest Version Docs Badge

ARPFloat is an implementation of arbitrary precision floating point data structures and utilities. The library can be used to emulate existing floating point types, such as FP16, and create new floating point types. Floating point types can scale to hundreds of digits, and perform very accurate calculations. In ARPFloat the rounding mode is a part of the type-system, and this defines away a number of problem that show up when using fenv.h.

no_std environments are supported by disabling the std feature.

Example

  use arpfloat::Float;
  use arpfloat::FP128;

  // Create the number '5' in FP128 format.
  let n = Float::from_f64(5.).cast(FP128);

  // Use Newton-Raphson to find the square root of 5.
  let mut x = n.clone();
  for _ in 0..20 {
      x += (&n / &x)/2;
  }

  println!("fp128: {}", x);
  println!("fp64:  {}", x.as_f64());

The program above will print this output:

fp128: 2.2360679774997896964091736687312763
fp64:  2.23606797749979

The library also provides API that exposes rounding modes, and low-level operations.

    use arpfloat::FP128;
    use arpfloat::RoundingMode::NearestTiesToEven;
    use arpfloat::Float;

    let x = Float::from_u64(FP128, 1<<53);
    let y = Float::from_f64(1000.0).cast(FP128);

    let val = Float::mul_with_rm(&x, &y, NearestTiesToEven);

View the internal representation of numbers:

   use arpfloat::Float;
   use arpfloat::FP16;

   let fp = Float::from_i64(FP16, 15);

   fp.dump(); // Prints FP[+ E=+3 M=11110000000]

   let m = fp.get_mantissa();
    m.dump(); // Prints 11110000000

Control the rounding mode for type conversion:

    use arpfloat::{FP16, FP32, RoundingMode, Float};

    let x = Float::from_u64(FP32, 2649);
    let b = x.cast_with_rm(FP16, RoundingMode::Zero);
    println!("{}", b); // Prints 2648!

Define new float formats and use high-precision transcendental functions:

  use arpfloat::{Float, Semantics};
  // Define a new float format with 120 bits of accuracy, and
  // dynamic range of 2^10.
  let sem = Semantics::new(10, 120);

  let pi = Float::pi(sem);
  let x = Float::exp(&pi);
  println!("e^pi = {}", x); // Prints 23.1406926327792....

Floating point numbers can be converted to Continued Fractions that approximate the value.

 use arpfloat::{Float, FP256, RoundingMode};

 let ln = Float::ln2(FP256);
 println!("ln(2) = {}", ln);
 for i in 1..20 {
   let (p,q) = ln.as_fraction(i);
   println!("{}/{}", p.as_decimal(), q.as_decimal());
 }

The program above will print this output:

  ln(2) = .6931471805599453094172321214581765680755001343602552.....
  0/1
  1/1
  2/3
  7/10
  9/13
  61/88
  192/277
  253/365
  445/642
  1143/1649
  1588/2291
  2731/3940
  ....

The examples directory contains a few programs that demonstrate the use of this library.

Resources

There are excellent resources out there, some of which are referenced in the code:

  • Books:
    • Handbook of Floating-Point Arithmetic 2010th by Jean-Michel Muller et al.
    • Elementary Functions: Algorithms and Implementation by Jean-Michel Muller.
    • Modern Computer Arithmetic by Brent and Zimmermann.
  • Papers:
    • An Accurate Elementary Mathematical Library for the IEEE Floating Point Standard, by Gal and Bachels.
    • How to print floating-point numbers accurately by Steele, White.
    • What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg.
    • Fast Multiple-Precision Evaluation of Elementary Functions by Richard Brent.
    • Fast Trigonometric functions for Arbitrary Precision number by Henrik Vestermark.
  • Other excellent software implementations: APFloat, RYU, libBF, newlib, musl, etc.

License

Licensed under Apache-2.0

No runtime deps