#router #path #tree #match #url

matchit

A blazing fast URL router and path matcher

10 unstable releases (3 breaking)

0.4.4 Oct 25, 2021
0.4.3 Sep 15, 2021
0.4.2 Jun 8, 2021
0.4.1 May 18, 2021
0.1.0 Dec 21, 2020

#69 in Network programming

Download history 23/week @ 2021-08-09 19/week @ 2021-08-16 20/week @ 2021-08-23 23/week @ 2021-08-30 149/week @ 2021-09-06 154/week @ 2021-09-13 330/week @ 2021-09-20 732/week @ 2021-09-27 222/week @ 2021-10-04 449/week @ 2021-10-11 332/week @ 2021-10-18 1294/week @ 2021-10-25 3384/week @ 2021-11-01 5520/week @ 2021-11-08 6193/week @ 2021-11-15 5929/week @ 2021-11-22

969 downloads per month
Used in 29 crates (6 directly)

MIT license

57KB
892 lines

MatchIt

Documentation Version License Actions

A blazing fast URL router and path matcher.

use matchit::Node;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut matcher = Node::new();
    matcher.insert("/home", "Welcome!")?;
    matcher.insert("/users/:id", "A User")?;

    let matched = matcher.at("/users/978")?;
    assert_eq!(matched.params.get("id"), Some("978"));
    assert_eq!(*matched.value, "A User");

    Ok(())
}

matchit relies on a tree structure which makes heavy use of common prefixes, effectively a radix tree. This makes lookups extremely fast. See below for technical details.

The tree is optimized for high performance and a small memory footprint. It scales well even with very long paths and a large number of routes. A compressing dynamic trie (radix tree) structure is used for efficient matching.

Parameters

As you can see, :id is a named parameter. The values are accessible via Params, which stores a list of keys and values. You can get the value of a parameter by name, params.get("id"), or by iterating through the list.

The registered path can contain two types of parameters:

Syntax    Type
:name     named parameter
*name     catch-all parameter

Named Parameters

Named parameters are dynamic route segments. They match anything until the next / or the path end:

Route: /user/:user

 /user/gordon              match: user = "gordon"
 /user/you                 match: user = "you"
 /user/gordon/profile      no match
 /user/                    no match

Catch-All parameters

The second type are catch-all parameters and have the form *name. Like the name suggests, they match everything. Therefore they must always be at the end of the pattern:

Route: /src/*filepath

 /src/                       match: filepath = "/"
 /src/somefile.html          match: filepath = "/somefile.html"
 /src/subdir/somefile.html   match: filepath = "/subdir/somefile.html"

Priority

Static and dynamic route segments are allowed to overlap. If they do, static segments will be given higher priority:

/:page
/posts/:year/:month/:post
/posts/:year/:month/index
/posts/:year/:month
/static/*path
/favicon.ico

The following routes will be matched:

/about                => /:page
/posts/2021/01/rust   => /posts/:year/:month/:post
/posts/2021/01/index  => /posts/:year/:month/index
/posts/2021/top       => /posts/:year/top
/static/foo.png       => /static/*path
/favicon.ico          => /favicon.ico

How does it work?

The matcher relies on a tree structure which makes heavy use of common prefixes, it is basically a compact prefix tree (or Radix tree). Nodes with a common prefix share a parent. Here is a short example what the routing tree for the GET request method could look like:

Priority   Path             Handle
9          \                *<1>
3          ├s               None
2          |├earch\         *<2>
1          |└upport\        *<3>
2          ├blog\           *<4>
1          |    └:post      None
1          |         └\     *<5>
2          ├about-us\       *<6>
1          |        └team\  *<7>
1          └contact\        *<8>

Every *<num> represents the memory address of a handler function (a pointer). If you follow a path trough the tree from the root to the leaf, you get the complete route path, e.g /blog/:post, where :post is just a placeholder (parameter) for an actual post name. Unlike hash-maps, a tree structure also allows us to use dynamic parts like the :post parameter, since we actually match against the routing patterns instead of just comparing hashes. This works very efficiently.

Because URL paths have a hierarchical structure and make use only of a limited set of characters (byte values), it is very likely that there are a lot of common prefixes. Storing the routes in this structure allows us to easily reduce the routing into a very small number of branches.

For even better scalability, the child nodes on each tree level are ordered by priority, where the priority is just the number of handles registered in child nodes. This means that nodes that are part of the most routing paths are always evaluated first, increasing the chance of reaching the correct route on our first try.

-----------------------------------

Benchmarks

As it turns out, this method of routing is extremely fast. In fact, matchit is one of the fastest, if not the fastest router out there. Here's a simple benchmark matching 4 paths against 130 registered routes. Matchit find the correct route in under 250 nanoseconds, blowing other routers out of the water. You can view the benchmark code in the bench.rs file.

Compare Routers/matchit 
time:   [216.85 ns 217.63 ns 218.44 ns]

Compare Routers/actix   
time:   [31.629 us 31.664 us 31.701 us]

Compare Routers/regex   
time:   [21.995 us 22.144 us 22.319 us]

Compare Routers/route-recognizer
time:   [4.2389 us 4.2434 us 4.2482 us]

No runtime deps

8wa