5 releases (3 breaking)

Uses new Rust 2021

0.4.0 May 25, 2022
0.3.0 Apr 13, 2022
0.2.1 Mar 18, 2022
0.1.1 Mar 10, 2022
0.1.0 Mar 10, 2022

#193 in Procedural macros

Download history 44/week @ 2022-04-27 180/week @ 2022-05-04 74/week @ 2022-05-11 35/week @ 2022-05-18 83/week @ 2022-05-25 82/week @ 2022-06-01 29/week @ 2022-06-08 82/week @ 2022-06-15 159/week @ 2022-06-22 192/week @ 2022-06-29 29/week @ 2022-07-06 118/week @ 2022-07-13 161/week @ 2022-07-20 118/week @ 2022-07-27 88/week @ 2022-08-03 104/week @ 2022-08-10

476 downloads per month
Used in structstruck

MIT license

115KB
2.5K SLoC

crates.io docs.rs license

Lightweight parsing for Rust proc macros

Venial is a WIP parser for Rust proc macros.

When writing proc macros that need to parse Rust code (such as attribute and derive macros), the most common solution is to use the syn crate. Syn can parse arbitrary valid Rust code, and even Rust-based DSLs, and return versatile data structures that can inspected and mutated in powerful ways.

It's also extremely heavy. In one analysis of lqd's early 2022 benchmark collection, the author estimates that syn is reponsible for 8% of compile times of the benchmark, which accounts for Rust's most popular crates. There are subtleties (eg this isn't necessarily critical path time, but syn is often in the critical path anyway), but the overall takeaway is clear: syn is expensive.

And yet, a lot of the power of syn is often unneeded. If we look at the crates that depend on syn, we can see that the 5 most downloaded are:

  • serde_derive
  • proc-macro-hack
  • pin-project-internal
  • anyhow
  • thiserror-impl

Of these, proc-macro-hack is deprecated, and the other four only need to parse basic information on a type.

Other popular reverse-dependencies of syn (such as futures-macro, tokios-macros, async-trait, etc) do use syn's more advanced features, but there's still room for a lightweight parser in proc-macros.

Venial is that parser.

Design

Venial is extremely simple. Most of its implementation is in the parse.rs file, which is about 350 lines at the time I'm writing this README. This is because the Rust language has a very clean syntax, especially for type declarations.

Venial has no dependency besides proc-macro2 and quote.

To achieve this simplicity, venial makes several trade-offs:

  • It can only parse declarations (eg struct MyStruct {}). It can't parse expressions or statements. For now, only types and functions are supported.
  • It doesn't try to parse inside type expressions. For instance, if your struct includes a field like foo_bar: &mut Foo<Bar, dyn Foobariser>, venial will dutifully give you this type as a sequence of tokens and let you interpret it.
  • It doesn't attempt to recover gracefully from errors. Venial assumes you're running inside a derive or attribute macro, and thus that your input is statically guaranteed to be a valid type declaration. If it isn't, venial will summarily panic.

Note though that venial will accept any syntactically valid declaration, even if it isn't semantically valid. The rule of thumb is "if it compiles under a #[cfg(FALSE)], venial will parse it without panicking".

The only exception is enum discriminants. Venial only supports enum discriminants with a single token, or a token-group. Eg:

enum MyEnum {
    A = 42,           // Ok
    B = "hello",      // Ok
    C = CONSTANT,     // Ok
    D = FOO + BAR,    // MACRO ERROR
    E = (FOO + BAR),  // Ok
}

This is because parsing complex discriminants requires arbitrary expression parsing, which is beyond the scope of this crate.

(Note: venial currently panics on unsupported declarations, eg traits, aliases, etc. Also, function support is incomplete.)

Example

use venial::{parse_declaration, Declaration};
use quote::quote;

let enum_type = parse_declaration(quote!(
    enum Shape {
        Square(Square),
        Circle(Circle),
        Triangle(Triangle),
    }
));

let enum_type = match enum_type {
    Declaration::Enum(enum_type) => enum_type,
    _ => unreachable!(),
};

assert_eq!(enum_type.variants[0].0.name, "Square");
assert_eq!(enum_type.variants[1].0.name, "Circle");
assert_eq!(enum_type.variants[2].0.name, "Triangle");

Performance

I haven't performed any kind of formal benchmark yet. That said, I compared this fork of miniserde using venial to the equivalent miniserde commit, and got the following results:

$ cargo check -j1 # miniserde-venial, clean build
    Finished dev [unoptimized + debuginfo] target(s) in 6.30s
$ cargo check -j1 # miniserde, clean build
    Finished dev [unoptimized + debuginfo] target(s) in 9.52s

$ cargo check -j4 # miniserde-venial, clean build
    Finished dev [unoptimized + debuginfo] target(s) in 3.17s
$ cargo check -j4 # miniserde, clean build
    Finished dev [unoptimized + debuginfo] target(s) in 4.79s

My machine is desktop computer with an AMD Ryzen 7 1800x (8 cores, 16 threads), I have 32GB of RAM and a 2.5TB SSD.

As we can see, using venial instead of syn shaves about 3.2s off total build times in single-threaded builds, and 1.6s in 4-threaded builds.

Most of the difference comes from syn and venial themselves: cargo check --timings shows that syn takes 2.11s to compile and venial takes 0.58s in 4-threaded builds.

I'm not showing codegen builds, release mode builds, 16-threads builds and the like, but the trend stays roughly the same: for the miniserde project, switching to venial removes ~30% of the build time.

So... Is it worth it?

That's a fairly complicated to answer. At the time I'm writing this section my answer is "Probably, but I'm less enthusiastic than when I started the project".

If you take the most optimistic interpretation, this is great! On a single-threaded machine, switching shaves three seconds off, a whole third of the build time!

In reality, there are a lot of complicating factors:

  • Venial never improves incremental build times at all (since dependencies are cached, even when incremental compilation is off).
  • The gap between syn and venial is shorter with any amount of multithreading.
  • I have a fairly powerful computer. Laptops might get more of a benefit from venial.
  • In projects bigger than miniserde, syn is usually one of many libraries being compiled at the same time. In some cases that means the build time of syn doesn't matter that much since it's compiled in parallel with other libraries. In other cases syn is on the critical path.
  • In practice, most clean build are run by CI servers. To measure the usefulness of venial, you'd need to analyze the specs of the servers used in Github Actions / Gitlab CI / whatever crater uses.

All in all, it's questionable whether the benefits are worth porting your derive crate from syn to venial (though my experience so far has been that porting isn't that hard).

Another thing to keep in mind is that this is a very young library. There has been very little effort to optimize it or profile it so far, and further versions may give a better build time reduction.

tl;dr: You can probably shave off a few seconds off your clean builds with venial. Incremental builds see no benefits.

Contributions

Pull requests are welcome.

I have no intention to work on venial in the near future myself, but I will still merge PRs.

Some possible improvements:

  • Fixing the function declaration parser.
  • Finding and fixing any eventual bugs.
  • Porting other projects from syn to venial and comparing compile times.
  • Parsing traits.
  • Parsing all possible declarations.

Dependencies

~110KB