1 unstable release
| 0.1.0 | Jan 29, 2026 |
|---|
#55 in Programming languages
81KB
1.5K
SLoC
lua2hcb_compiler
This tool compiles a Lua 5.3-looking decompiler output back into HCB bytecode.
It targets the practical subset used by the current rfvp/hcb decompiler pipeline:
- Top-level
function NAME(...) ... end - Structured control flow:
if/elseif/else/end,while/do/end,break - Function calls and syscalls
- No closures/upvalues, no vararg, no multi-return
In addition to structured if/while, the compiler also supports the decompiler's
__pc-dispatcher form:
local __pc = 0
while true do
if __pc == 0 then
...
if S0 == 0 then
__pc = 2
else
__pc = 1
end
elseif __pc == 1 then
...
__pc = 2
else
return
end
end
Pure local declarations such as local S0, S1, S2 are accepted and ignored.
Important IR convention
This compiler assumes the input Lua is decompiler IR style (stack-machine reconstruction), e.g.
S0 = 123means “push 123”S1 = (S1 == S2)means “compare top two values and push the result”__ret = Foo(...)means “call Foo; return value is available via__ret/push_return”
The if/while conditions are treated as the same style used previously in the CFG/state-machine form:
if S0 ~= 0 then/while S0 ~= 0 do:jzis used to branch on zeroif S0 == 0 then/while S0 == 0 do: layout is inverted (still usesjz)
Strings / encoding
All C-strings in HCB are stored as:
- 1-byte length including the trailing NUL
- bytes of the string followed by
\0
This applies to:
push_stringimmediates inside codegame_titleand syscall names insidesysdesc
Encoding is controlled by nls in YAML (ShiftJIS, UTF-8, GB18030).
YAML meta format
example:
nls: ShiftJIS
sys_desc_offset: 0 # ignored (recomputed)
entry_point: 0 # ignored (recomputed from function entry_point())
non_volatile_global_count: 1915
volatile_global_count: 1990
custom_syscall_count: 0
game_mode: 7
game_title: "..."
syscall_count: 148 # optional; validated if present
syscalls:
0: { args: 2, name: AudioLoad }
1: { args: 2, name: AudioPlay }
...
147: { args: 1, name: WindowMode }
syscalls may also be provided as a YAML list (legacy), but the map form is recommended.
The difference from Lua 5.3
| Area | Lua 5.3 (full language) | This project’s supported subset | Practical implication / reminder | |
|---|---|---|---|---|
| Intended scope | General-purpose scripting language | Deterministic “decompiler-friendly” subset for round-tripping to HCB bytecode | Treat this as a compilation format, not a general Lua target | |
| Entry point | No fixed name; host decides what to call | function entry_point() is required and is used to compute the HCB entry address |
Renaming/omitting entry_point breaks linkage |
|
| Program structure | Arbitrary chunk with any order of statements | Multiple function f_xxxxxxxx(...) ... end allowed in any order; forward calls allowed |
Function definitions and uses do not need to be ordered | |
| Lexical / comments | Full Lua comments (--, --[[...]]), long strings |
Same (as tolerated by the parser) | Prefer simple line comments; avoid exotic long-bracket nesting | |
| Identifiers | Any valid Lua identifier | Expected conventions: aN args, lN locals (optional), S0.. temporaries, G[i] globals, LT[idx][key] / GT[idx][key] tables |
Using arbitrary variable names may become unsupported | |
| Types / values | nil, boolean, number (int/float), string, table, function, userdata, thread |
nil, boolean, number, string; tables only via explicit VM-style access; functions only as named top-level functions |
No general table constructors, userdata, threads | |
| Truthiness | false and nil are falsey; everything else truthy (including 0, "") |
Control flow is primarily compiled from 0/1-style conditions (e.g., if S0 == 0 then ...) |
Do not rely on Lua truthiness of non-boolean values; prefer explicit comparisons | |
| Expressions (general) | Full expression grammar, precedence, short-circuit, metamethods | Limited expression shapes that map to known HCB ops (arithmetic, comparisons, simple boolean composition patterns) | Complex expressions may fail to compile; keep expressions simple and explicit | |
| Arithmetic operators | + - * / // % ^ |
+ - * / % supported; // (integer division) and ^ only if explicitly mapped in the bytecode set you use |
Avoid // unless you confirmed the opcode mapping exists |
|
| Bitwise operators | `& | ~ << >>and unary~` |
Only patterns that map to the VM’s bit ops (commonly bit_test-style patterns) |
Do not write general bitwise arithmetic unless the compiler explicitly supports it |
| Concatenation | .. |
Only if mapped; otherwise unsupported | Prefer precomputed strings / avoid dynamic concatenation | |
| Length operator | # |
Generally unsupported unless mapped | Avoid #t and #s unless you confirmed support |
|
| Relational operators | == ~= < <= > >= |
Supported when operands are in supported forms (Sx, aN, G[i], literals) |
Keep operands “simple” and VM-addressable | |
| Boolean operators | and, or, not with short-circuit semantics |
not may be supported if mapped; and/or only in restricted decompiler-style patterns (not general short-circuit truthiness) |
Avoid idiomatic Lua a and b or c constructs |
|
| Assignment | Multiple assignment, destructuring, local init | Single-target assignments are expected; multi-assign may be partially supported only for local S0,S1,... declarations |
Prefer one assignment per line | |
| Local declarations | local x, local x = expr, local a,b = ... |
local S0, S1, ... and similar declarations are accepted and ignored (no codegen) |
Declarations exist only to keep Lua syntax valid/readable | |
| Control flow | if/elseif/else/end, while, repeat/until, numeric for, generic for, goto/::label::, break |
if/elseif/else, while, break supported; repeat/until, for, goto not supported |
Keep loops to while; avoid for and repeat |
|
| Return statements | return, return a,b,c (multi-return), tail-call behaviors |
return and single-value return S0 supported (as VM ret/retv) |
Do not use multi-return; return at most one value | |
| Functions / closures | function statements/expressions, closures, upvalues, recursion, methods (:), vararg (...) |
No closures/upvalues; no vararg; named functions only; recursion is okay if VM supports it | Do not define nested functions; do not capture outer locals | |
| Multiple return values | Full multiple return semantics (return a,b, assignment from multi-return calls) |
Not supported | Every call is treated as producing at most one usable return value | |
| Method call syntax | obj:method(a) sugar for obj.method(obj,a) |
Not supported unless emitted in a VM-specific lowered form | Use explicit function names / VM syscall patterns instead | |
| Table constructors | {}, {a=1}, {[k]=v}, array parts, mixed |
Not supported | Tables are manipulated only via explicit VM table ops (e.g., LT[idx][key] = v) |
|
| Metatables / metamethods | setmetatable, operator overloading, __index, etc. |
Not supported | Operator behavior is VM-defined, not Lua metamethod-driven | |
| Modules | require, package loaders, environments |
Not supported | No module system; single compilation unit | |
| Error handling | error, assert, pcall, xpcall |
Not supported unless present as explicit syscalls | Do not rely on Lua exception mechanisms | |
| Coroutines | coroutine.*, yielding, resumptions |
Not supported | Even if the engine uses “threads,” they are VM/syscall-level, not Lua coroutines | |
| Standard library | Full Lua 5.3 base/string/table/math/io/os/debug/utf8 | Not supported as Lua-level libraries; functionality expected via syscalls and VM globals | Use syscalls listed in YAML; do not call string.* / table.* etc. |
|
| Special decompiler artifacts | Not part of Lua spec | __ret (return register marker), optional __pc dispatcher form, S0.. temporaries, G[i] globals |
These are compiler contract elements, not standard Lua | |
| Strings encoding | Lua source is bytes; conventionally UTF-8, but not required | Encoding controlled by YAML nls (e.g., ShiftJIS); all emitted C-strings are NUL-terminated and length-prefixed with u8 including NUL |
Keep string literals compatible with the chosen encoding; avoid characters not representable in that code page | |
| Determinism | Lua execution depends on runtime semantics, metamethods, libs | Compilation must be deterministic and match the VM opcode model | Avoid any construct whose meaning depends on Lua runtime facilities |
Build
cargo build --release
Usage
./target/release/lua2hcb --meta meta.yml --lua script.lua -o script.hcb
The tool always uses function entry_point() as the entry.
Dependencies
~7.5–10MB
~242K SLoC