#combinator #parser-combinator #parser

combi

A combinators library with an included rust token parsing implementation

2 releases (1 stable)

1.0.2 Mar 16, 2025
0.1.0 Nov 1, 2023

#37 in Parser tooling

Download history 2/week @ 2024-12-09 67/week @ 2025-03-10 50/week @ 2025-03-17

117 downloads per month
Used in enumtrait

Custom license

99KB
2.5K SLoC

Combi

What is this?

A simple combinator library, allowing functions that succeed, fail or can be continued/ignored to be composed.

alt text

Before you ask: "Isn't this just functions with extra steps", no! Its monads with extra steps.

Different specialised combinators can be built atop combi, typically parsing logic.

  • Combis are similar to typical parser combinators, but the addition of continuations enable them to support multiple syntax errors, without requiring error nodes in a more complex AST
  • The Combis make no assumptions about input structure, allowing them to compute on buffers (like regular parsers) but also streams (with no backtracking), or trees (such as rust proc_macro2::TokenTrees)

This was build as part of the emDB project.

Included Specialisations

Tokens

Parser combinators for tokenstreams, allowing LL(1) grammars (recursive descent, no backtracking).

  • Produce compiler diagnostics using proc_macro_error2
  • Can recover from failures easily, with custom recovery logic

It is implemented simply (explicit peeking for lookahead, explitit recovery), and is intended to be as close to zero-cost over a comparably features hand-rolled parser.

For maximum size cases of the tokens benchmarks:

Timer precision: 15 ns
tokens              fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ large_groups                   │               │               │               │         │
│  ├─ ChumskyProc   268.3 ms      │ 336.1 ms      │ 284.2 ms      │ 287.1 ms      │ 100100
│  ├─ CombiParser   42.38 ms      │ 54.22 ms      │ 45.18 ms      │ 45.44 ms      │ 100100
│  ╰─ HandRolled    13.54 ms      │ 18.11 ms      │ 15.16 ms      │ 15.49 ms      │ 100100
├─ long_sequence                  │               │               │               │         │
│  ├─ ChumskyProc   109.2 ms      │ 162.8 ms      │ 120.3 ms      │ 123 ms        │ 100100
│  ├─ CombiParser   36.48 ms      │ 55.93 ms      │ 42.9 ms       │ 43.44 ms      │ 100100
│  ╰─ HandRolled    27.22 ms      │ 54.56 ms      │ 32.25 ms      │ 33.85 ms      │ 100100
├─ parse_nothing                  │               │               │               │         │
│  ├─ ChumskyProc   64.8 ns       │ 12.59 µs      │ 121.6 ns      │ 238.6 ns      │ 1001600
│  ├─ CombiParser   28.82 ns      │ 28.97 ns      │ 28.9 ns       │ 28.9 ns       │ 1006400
│  ╰─ HandRolled    7.177 ns      │ 7.22 ns       │ 7.199 ns      │ 7.199 ns      │ 10025600
╰─ recursive_ident                │               │               │               │         │
   ├─ ChumskyProc   83.55 µs      │ 254.3 µs      │ 87.62 µs      │ 93.95 µs      │ 100100
   ├─ CombiParser   9.704 µs      │ 11.2 µs       │ 10.17 µs      │ 10.21 µs      │ 100100              
   ╰─ HandRolled    4.883 µs      │ 5.794 µs      │ 5.122 µs      │ 5.127 µs      │ 100100

Potential Improvements

  • parse seq(recovgroup(P1), P2) in parallel (tree based data structure is naturally parallel) as treepar(P1, P2), need to get around issues with TokenStream being !Send and !Sync
  • Optimise the recursive parser to remove heap allocation

Inlining

When aggressively inlining, but not optimising, stack frames can get huge.

  • This can result in stackoverflows during compilation (as the proc macro is run after being dynamically linked with rustc).
  • scripts/extra.gdb has some additional helpers for debugging this
  • Can also set relevant flags in ../.argo/config.toml

Dependencies

~245–710KB
~16K SLoC