#wasm-module #ir #ssa #wasm-bytecode #waffle #optimization #analysis

bin+lib portal-pc-waffle

Wasm Analysis Framework For Lightweight Experiments

25 releases

new 0.3.4 Nov 30, 2024
0.3.3 Nov 29, 2024
0.2.17 Nov 15, 2024
0.1.1+portal Sep 19, 2024
0.0.29+portal Sep 5, 2024

#383 in WebAssembly

Download history 167/week @ 2024-08-29 168/week @ 2024-09-05 88/week @ 2024-09-12 212/week @ 2024-09-19 29/week @ 2024-09-26 13/week @ 2024-10-03 15/week @ 2024-10-10 19/week @ 2024-10-17 8/week @ 2024-10-24 1227/week @ 2024-10-31 366/week @ 2024-11-07 153/week @ 2024-11-14 422/week @ 2024-11-21 380/week @ 2024-11-28

1,405 downloads per month
Used in 12 crates (10 directly)

Apache-2.0 WITH LLVM-exception

3MB
33K SLoC

WAFFLE: Wasm Analysis Framework for Lightweight Experimentation

Synopsis: an SSA IR compiler framework for Wasm-to-Wasm transforms, in Rust.

Status: working for Wasm MVP; roundtrips complex modules successfully

The transforms from Wasm to IR and from IR to Wasm work well, and has been fuzzed in various ways. In particular, waffle is fuzzed by roundtripping Wasm through SSA IR and back, and differentially executing the original and roundtripped Wasm under Wasmtime (with limits on execution time). At this time, no correctness bugs have been found.

Waffle is able to roundtrip (convert to IR, then compile back to Wasm) complex modules such as the SpiderMonkey JS engine compiled to Wasm.

Waffle has some basic mid-end optimizations working, such as GVN and constant propagation. Much more could be done on this.

There are various ways in which the generated Wasm bytecode could be improved; work is ongoing on this.

Architecture

The IR is a CFG of blocks, containing operators that correspond 1-to-1 to Wasm operators. Dataflow is via SSA, and blocks have blockparams (rather than phi-nodes). Wasm locals are not used in the IR (they are converted to SSA).

The frontend converts Wasm into this IR by building SSA as it goes, inserting blockparams when it discovers multiple reaching definitions for a local. Multivalue Wasm (parameters and results for every control-flow block) is fully supported, and converted to SSA. This process more or less works like Cranelift's does, except that memory, table, etc. operations remain at the Wasm abstraction layer (are not lowered into implementation details), and arithmetic operators mirror Wasm's exactly.

The backend operates in three stages:

  • Structured control flow recovery, which uses Ramsey's algorithm to convert the CFG back into an AST of Wasm control-flow primitives (blocks, loops, and if-then AST nodes).

  • Treeification, which computes whether some SSA values are used only once and can be moved to just before their single consumer, computing the value directly onto the Wasm stack without the need for an intermediate local. This is a very simple form of code scheduling.

  • Localification, which performs a register allocation (using a simple linear-scan algorithm) to assign all SSA values to locals such that no live-ranges overlap in the same local.

  • Like Binaryen but with an SSA IR, rather than an AST-based IR. Dataflow analyses are much easier when one doesn't have to handle arbitrary reads and writes to locals. Binaryen is able to stackify/reloop arbitrary control flow (CFG to Wasm) but does not implement the other direction (Wasm to CFG), and it has only a C/C++ API, not Rust.

  • Like Walrus but also with an SSA IR. Walrus is in Rust and designed for Wasm-to-Wasm transforms as well, but its IR mirrors the Wasm bytecode closely and thus presents the same difficulties as Binaryen for traditional CFG-of-SSA-style compiler analyses and transforms.

  • Halfway like Cranelift, in that the IR is similar to Cranelift's (a CFG of SSA IR with blockparams), but with the Wasm backend as well (Cranelift only does Wasm-to-IR). WAFFLE's IR also deliberately remains at the Wasm abstraction level, maintaining 1-to-1 correspondence with all operators and maintaining the concepts of memories, tables, etc., while Cranelift lowers operations and storage abstractions into runtime/embedding-specific implementation details in the IR.

Dependencies

~10–18MB
~268K SLoC