#graphviz #graph #dot #graph-node

rust_dot

RustDOT is mostly the Graphviz DOT language, lightly rustified

6 releases (breaking)

0.6.0 May 3, 2024
0.5.1 Apr 8, 2024
0.4.0 Apr 2, 2024
0.3.0 Mar 27, 2024
0.1.0 Feb 18, 2024

#589 in Parser implementations

Download history 3/week @ 2024-09-23 5/week @ 2024-09-30 1/week @ 2024-10-07

555 downloads per month

MIT/Apache

42KB
795 lines

RustDOT is mostly the Graphviz DOT language, lightly rustified. It can be embedded as a macro or parsed from a string or file. The purpose is extracting the stucture. Layout hints are currently out of scope.

let g1 = rust_dot! {
    graph {
        A -- B -- C; /* semicolon is optional */
        "B" -- D // quotes not needed here
    }
};
println!("{} {} \"{}\" {:?} {:?}", g1.strict, g1.directed, g1.name, g1.nodes, g1.edges);
// false false "" ["A", "B", "C", "D"] [(0, 1), (1, 2), (1, 3)]

let g2 = parse_string("digraph Didi { -1 -> 2 -> .3  2 -> 4.2 }");
println!("{} {} \"{}\" {:?} {:?}", g2.strict, g2.directed, g2.name, g2.nodes, g2.edges);
// false true "Didi" ["-1", "2", ".3", "4.2"] [(0, 1), (1, 2), (1, 3)]

The return values can be fed to crates petgraph:

let mut petgraph = petgraph::graph::Graph::new();
let nodes: Vec<_> = rust_dot_graph.nodes
    .iter()
    .map(|node| petgraph
        .add_node(node))
    .collect();
for edge in rust_dot_graph.edges {
    petgraph
        .add_edge(nodes[edge.0], nodes[edge.1], ());
};

or graph/graph_builder:

use graph::prelude::*;

let graph: DirectedCsrGraph<usize> = GraphBuilder::new()
    .csr_layout(CsrLayout::Sorted)
    .edges(rust_dot_graph.edges)
    .build();

This is work in progress. Nothing is stabilised!

Todo

  • Implement strict, it is currently ignored/skipped

  • Return Err instead of panicking on wrong input

  • Put Spans on Lexemes, based on their input, maybe using crate macroex

  • Separate return type (currently Parser, which should be internal)

  • Implement node attributes, they are currently ignored/skipped

  • Implement node defaults

  • Implement edge attributes, they are currently ignored/skipped

  • Implement edge defaults

  • Deal with graph attributes, with and without keyword graph

  • Reimplement rust_dot as a proc-macro, transforming its input as const at compile time

  • As an extension to DOT, allow label or weight to come from a Rust expression

  • As an extension to DOT, allow label or weight to come from invoking a closure

Limitations

Rust macros are tokenised by the Rust lexer, which is subtly different from Graphviz. For consistency (and ease of implementation) the parse_* functions use the same lexer. These are the consequences:

  • Macros must be in UTF-8, while the input to the parse_* functions may also be UTF-16 or Latin-1. You must deal with other encodings yourself.
  • Double quotes, parentheses, braces and brackets must be balanced and some characters are not allowed. As a workaround you can change something like the following first line into the second. The commented quotes are seen by Rust, but ignored as HTML (once that is implemented):
    <<I>"</I> <B> )}] [{( </B> \\>
    <<I>"<!--"--></I> <B><!--"--> )}]  [{( <!--"--></B> <!--"-->\\<!--"-->>
    
  • Html is partially a space aware language, where Rust is not. So on the macro side it’s impossible to get space right, and on run time input it would be quite some effort. Instead this uses a heuristic of space between everything, except inside tags and entities and before [,;.:!?] (incomplete and wrong for some languages.)
  • Strings are not yet unescaped, when we get them, yet the Rust lexer validates them. The parse_* functions work around this, but in rust_dot! you must use raw strings like r"\N" when they contain unrusty backslash sequences.
  • Comments are exactly Rust comments. They differ from DOT in that block comments can nest.
  • Not officially comments, but everything after # on the same line is also discarded. Unlike real comments, these are handled by RustDOT, after lexical analysis. This means that the rest of the line, like the 1st point above, must be balanced. And it will only end after the closing delimiter, so you should put that on the same line! In rust_dot! you must use // instead! (Only the nightly compiler gives access to line numbers in macros.)
  • Valid identifiers should be accepted by Rust. Though (only in rust_dot!) confusable letters like cyrillic ‘о’ or rare scripts like runic give warnings.
  • Valid numbers should be accepted by Rust. And floats do not need a leading zero before decimal dot.
  • RustDOT returns one graph, so it wants one in the input. The grammar doesn’t clarify multiple graphs per file, but they are accepted. However they lead to 2 svgs invalidly concatenated in one file or a png displaying only the first. Likewise it accepts an empty document – not so RustDOT.

Dependencies

~77KB