#syntax-tree #language #shading #web-gpu #parser #zero-copy #recursive-descent

wgsl-parser

A zero-copy recursive-descent parser for WebGPU shading language

5 releases (breaking)

0.5.0 Jun 1, 2024
0.4.0 May 25, 2024
0.3.0 May 11, 2024
0.2.0 May 8, 2024
0.1.0 May 6, 2024

#1 in #shading

Download history 408/week @ 2024-05-05 40/week @ 2024-05-12 175/week @ 2024-05-19 93/week @ 2024-05-26 161/week @ 2024-06-02 22/week @ 2024-06-09 7/week @ 2024-06-16

453 downloads per month

MIT/Apache

175KB
5.5K SLoC

WGSL Parser

A hand-rolled, zero-copy recursive-descent parser for WebGPU shading language, written with Gramatika.

API

Parsing a source file

use parser::{Parse, ParseStream, SyntaxTree};

const SOURCE_TEXT: &str = include_str!("path/to/some/shader.wgsl");

fn main() -> anyhow::Result<()> {
	let mut parser = ParseStream::from(SOURCE_TEXT);
	// The `SyntaxTree`
	let tree = parser.parse::<SyntaxTree>()?;
	// A tuple of `(ArcStr, Vec<Token>)`
	let (source, tokens) = parser.into_inner();

	Ok(())
}

Tokenizing a source file without doing a full parse

use gramatika::Lexer as _;
use parser::Lexer;

const SOURCE_TEXT: &str = include_str!("path/to/some/shader.wgsl");

fn main() {
	let mut lexer = Lexer::new(SOURCE_TEXT.into());
	// `Vec<Token>`
	let tokens = lexer.scan();
}

Syntax tree representation

A SyntaxTree contains a vector of Decls representing the top-level syntax types defined by the WGSL grammar, e.g.:

  • Decl::Var(VarDecl { .. })

     @group(1) @binding(2)
     var<uniform> uniforms: Uniforms;
    
  • Decl::Const(VarDecl { .. })

     const FOO: u32 = 1u;
    
  • Decl::Struct(StructDecl { .. })

     struct Foo {
     	foo: mat3x4<f32>,
     	bar: vec2<u32>,
     	baz: array<mat4x4<f32>, 256u>,
     }
    
  • Decl::Function(FunctionDecl { .. })

     fn sum(a: f32, b: f32) -> f32 {
     	return a + b;
     }
    

The structures wrapped by those declarations can contain sub-declarations, e.g.:

  • Decl::Field(FieldDecl { .. }) inside of a StructDecl
  • Decl::Param(ParamDecl { .. }) inside of a FunctionDecl

The body of a FunctionDecl contains a vector of Stmts.

Stmt is an enum in a form similar to Decl, with variants indicating the kind of statement it represents, each wrapping an inner structure that describes the syntax in further detail, often recursively, e.g.:

// NOTE: This is not valid Rust code, just a pseudo-code example of what some
//       `Stmt` might look like on the inside

Stmt::If(IfStmt {
	..
	else_branch: Some(ElseStmt {
		..
		body: Arc(Stmt::Block(BlockStmt {
			..
			stmts: Arc<[Stmt]>,
		})),
	}),
})

Finally, Expr is the "lowest" form of syntax in the tree, taking the same general form as Decl and Stmt above.

Inspecting a syntax tree

Each node of the syntax tree derives a bespoke Debug implementation, which prints the tree in a format that's a sort of cross between Lisp (a format commonly used for representing syntax trees) and Rust syntax.

That format looks like this:

max(4, 2) // The expression represented by the tree below
(Expr::Primary (PrimaryExpr
	expr: (Expr::FnCall (FnCallExpr
		ident: (IdentExpr::Leaf `max` (Function (1:1...1:4))),
		arguments: (ArgumentList
			brace_open: `(` (Brace (1:4...1:5)),
			arguments: [
				(Expr::Primary (PrimaryExpr
					expr: (Expr::Literal `4` (IntLiteral (1:5...1:6))),
				)),
				(Expr::Primary (PrimaryExpr
					expr: (Expr::Literal `2` (IntLiteral (1:8...1:9))),
				)),
			],
			brace_close: `)` (Brace (1:9...1:10)),
		),
	)),
))

Traversing a syntax tree

The package exports a Visitor trait which can be implemented to efficiently traverse the tree. Visitor defines a visit_ method for each type of syntax represented by the tree. visit_ methods for nodes that contain child nodes must return either FlowControl::Continue to traverse their children, or FlowControl::Break to stop traversing the current branch.

The default Visitor implementation returns FlowControl::Continue for every node, so you only need to actually implement the visit_ methods that your particular use case calls for:

use std::collections::HashMap;

use gramatika::{Substr, Token as _};
use parser::{
	decl::VarDecl,
	expr::{IdentExpr, NamespacedIdent},
	traversal::{FlowControl, Visitor, Walk},
	ParseStream, SyntaxTree,
};

#[derive(Default)]
struct ReferenceCounter {
	counts: HashMap<Substr, usize>,
}

impl Visitor for ReferenceCounter {
	fn visit_var_decl(&mut self, decl: &VarDecl) -> FlowControl {
		self.counts.insert(decl.name.lexeme(), 0);

		if let Some(ref expr) = decl.assignment {
			expr.walk(self);
		}

		FlowControl::Break
	}

	fn visit_ident_expr(&mut self, mut expr: &IdentExpr) {
		if let IdentExpr::Leaf(name) = expr {
			if let Some(count) = self.counts.get_mut(&name.lexeme()) {
				*count += 1;
			}
		}
	}
}

// Note: Not actually a robust implementation of a reference-counter,
//       but good enough for this toy example
pub fn count_references(tree: &SyntaxTree) -> HashMap<Substr, usize> {
	let mut visitor = ReferenceCounter::default();
	tree.walk(&mut visitor);

	visitor.counts
}

Dependencies

~1–7MB
~38K SLoC