1 unstable release

Uses new Rust 2024

new 0.1.0 Mar 22, 2025
0.0.26 Mar 22, 2025

#431 in Database interfaces

MIT license

110KB
1K SLoC

Regular subset of TOML

Nuget Crates.io

Features and non-features:

  • Simple key-value parser based on TOML spec (v1.0.0)
  • Streaming: performs actions during parsing
  • Zero-copy: does not allocate memory unless requested
  • Stackless: collects absolutely nothing on the stack
  • Automata based: reliable and optimal (see benchmarks in F# and benchmarks in Rust)
  • Inlining: inlines lambdas down at the call site, no indirection or virt-calls, no need to implement any interfaces.
  • Single file with no dependencies: no bloat, no enterprise coding practices, no folder hierarchies with circular references, no third party supply chain attacks.
  • Raw UTF8: no .NET char conversion, runs on bytes directly
  • Implementations in F# and Rust

F# benchmarks

Rust benchmarks

Spec and description:

I very much like the TOML format for flat key-value storage and for the features that are supported, it is fully compatible with TOML. This means you don't need to learn anything new and can use existing tooling like strongly typed schemas in the editor.

The TOML format has a beautiful property that nested types can be expressed in a regular grammar, and this library takes advantage of that. It's been an experimental project on my shelf for a bit and there's still more things to add, but if you don't use any fancier features then it is usable already. Of course TOML was never meant to be a data storage format and this is misusing the original purpose but perhaps the fact that regular-toml can be parsed much faster may change your mind.

basic usage (F#)

let toml : byte[] = "
[server]
port = 8080
hostname = 'abc'
"B
let dictionary = RToml.toDictionary(toml)
dictionary["server.port"].kind          // INT
dictionary["server.port"].ToInt(toml)   // 8080

// or any of the other formats
let array = RToml.toArray(toml)
let array2 = RToml.toStructArray(toml)
let valuelist = 
    use vlist = RToml.toValueList(toml)
    for v in vlist do () //.. do something
// or iterate over the key-value pairs
RToml.stream (
    toml,
    (fun key value ->
        if value.kind = Token.TRUE then
            let keystr = key.ToString toml // struct to string
            printfn $"{keystr} at pos:{key.key_begin} is set to true"
    )
)

basic usage (Rust)

fn main() {
    let toml = b"
[server]
port = 8080
hostname = 'abc'
";
    let map = r_toml::to_map(toml).unwrap();
    dbg!(&map["server.port"].kind); // INT
    dbg!(&map["server.port"].to_int(toml)); // Ok(8080)

    // or iterate over key-value pairs
    let mut key_buf = Vec::new();
    r_toml::stream(toml, |k, v| {
        println!("{} = {:?}", k.to_str(&mut key_buf, toml), v.kind);
        key_buf.clear();
    });
}

Supported types:

  • keys and basic primitives: bool/int/float/string (including basic, literal and multiline strings) true/false,10,0.005, 'string'
  • datetime/datetimeoffset 1979-05-27T07:32:00Z
  • tables [entry], [entry.inner]
  • arrays of tables [[products]]
  • typed arrays of basic primitives: bool[]/int[]/float[]/string[] [1, 2, 3]
  • comments (only at beginning of line for now) # comment

Unsupported TOML types and features:

inline tables:
person = { address = { postcode = 123, street = "abc" } }

however you can do any of these

person.address.postcode = 123
person.address.street = "abc"

[person]
address.postcode = 123
address.street = "abc"

[person.address]
postcode = 123
street = "abc"
deep and/or mixed arrays:
datapoints = [[[0,1,3],"abc"],{ x = 1, y = 2}]
Key types

There is one particular type of TOML key that is not supported:

"person"."address"."name"."this"."can"."be"."very"."long"
["person"."address"."name"]
[["person"."address"."name"]]

As it's unlikely to see quoted keys in most toml files, there is no support for quoted keys.

The reason is that to parse this key there needs to be some form of conversion, collection or book-keeping to extract the values inside the quotes. While this is entirely regular and there's no problem actually reading this, this violates the Streaming, Zero-Copy, Stackless property as you need to separate the keys from the quotes somehow. Just a single quoted key is technically fine, that may be added some time.

String types

Whether a string contains an escape sequence is detected and added as a flag to the string, however there is no implementation to do the actual escaping right now, so if you use escape sequences you need to perform escaping yourself.

The string kinds after parsing are labeled as follows:

  • VALID_STR : valid string, no post-processing needed
    • the span contains a valid string without any escape sequences
    • for multiline strings the optional leading newline has already been trimmed
  • ESC_STR : escape sequence detected
    • the span contains a valid string but contains escape sequences
    • for multiline strings the optional leading newline has already been trimmed
  • EMPTY_STR: the string is empty, this is just a special case to prevent redundant work

Array types

Based on an arbitrary decision, the only arrays supported are all of single type. The parser returns the region where the values are, which have to be post-processed after. The region itself is already validated to meet TOML spec, that is a token of one of the following types:

  • ARR1_STR:
    • valid comma separated input, all entries are strings
    • eg., "asd", 'ghi', """123"""
  • ARR1_INT:
    • valid comma separated input, all entries are integers
    • eg., 1, -2, +3
  • ARR1_FLOAT:
    • valid comma separated input, all entries are floats
    • eg., 1.0, inf, nan
  • ARR1_BOOL:
    • valid comma separated input, all entries are bools
    • eg., true, false

there is no iterator for the array values right now but in the future maybe there will

--

as a visual example to how the input is processed into tokens see:

Nice to haves for the future:

  • proper benchmark data extracted from Cargo.toml or pyproject.toml files
  • iterator for array types
  • depth 2 arrays of basic primitives: bool[][],int[][],float[][],string[][]
  • toml compliance tests
  • code-gen for other languages
  • array of tables deserialization into an iterator for convenience
  • string to tagged union deserialization
  • perf. comparisons with other serialization formats (json)
  • avx2/avx512 intrinsics for strings

No runtime deps