2 releases
0.1.1 | Mar 6, 2024 |
---|---|
0.1.0 | Mar 5, 2024 |
#247 in Programming languages
47 downloads per month
115KB
3.5K
SLoC
PL/0 (aka. PL_0)
❤️ Please give me a
Star
/Follow
if you like this project! ❤️
To begin with
This is the curriculum design
of Compiler Principle
course in Nanjing University of Aeronautics and Astronautics
(aka. NUAA
).
Introduction
PL/0
is a subset language
of Pascal
.
This is a simple Rust
implementation of PL/0
compiler.
BNF
<prog> -> program <id> ; <block>
<block> -> [<const-decl>][<var-decl>][<proc>]<body>
<const-decl> -> const <const> {, <const>} ;
<const> -> <id> := <integer>
<var-decl> -> var <id> {, <id>} ;
<proc> -> procedure <id> ([<id> {, <id>}]) ; <block> {; <proc>}
<body> -> begin <statement> {; <statement>} end
<statement> -> <id> := <exp>
| if <l-exp> then <statement> [else <statement>]
| while <l-exp> do <statement>
| call <id> ([<exp> {, <exp>}])
| <body>
| read (<id> {, <id>})
| write (<exp> {, <exp>})
<l-exp> -> <exp> <lop> <exp> | odd <exp>
<exp> -> [+|-] <term> {<aop> <term>}
<term> -> <factor> {<mop> <factor>}
<factor> -> <id> | <integer> | (<exp>)
<lop> -> = | <> | < | <= | > | >=
<aop> -> + | -
<mop> -> * | /
<id> -> <letter> {<letter> | <digit>}
<integer> -> <digit> {<digit>}
<letter> -> a | b | ... | z | A | B | ... | Z
<digit> -> 0 | 1 | ... | 9
Structure
$$ \set{Source Code} \Longrightarrow \textbf{Lexer} \stackrel{Token}{\Longrightarrow} \textbf{Parser} \stackrel{AST}{\Longrightarrow} \textbf{CodeGen} \Longrightarrow \set{PCode} \longrightarrow \textbf{VM} \longrightarrow \set{Result} $$
Part | Analysis List |
---|---|
Lexer | Lexical Analysis |
Parser | Syntax Analysis |
CodeGen | Semantic Analysis |
Overview
Lexer/Tokenizer
This part is extreme easy, I've implemented it in my own hand without using any other tools.
(However, if you'd love to, you could use tools like flex
or pest
to generate lexer/tokenizer
automatically)
Parser
With the help of Recursive Descent Algorithm
, parser
is also not that hard to implement.
However, it's necessary to prove that the given BNF satisfy the definition of LL(1)
before implementing parser
in Recursive Descent Algorithm
.
Proof will be given later.
Error Handling
I've adopt the welcomed panic-mode-liked
error handling strategy for this part, to make sure that the compiler
could find as many errors as possible in one run, instead of being halted by the first error.
To make sure error could be handled in a synchros
way, FIRST-FOLLOW
table is a must (I've build this manually, which could be further improved by using auto-tools).
Codegen
AST
to PCode
code-generator is the default strategy for this part.
I'm working on a AST
to Lua-Backend-Adapted-Representation
(LBAR) code-generator as well (not implemented yet).
Virtual Machine (aka. VM / Interpreter)
Sense PCode
is the default execution result of codegen
, the Simple-PCode-Interpreter
is the default implementation of Virtual Machine
Still, I'm trying to implement a Lua-VM-Liked-VM
for LBAR
Feasibility Analysis
Proof: BNF is LL(1)
To satisfy this, 3 conditions should be met:
$$
\begin{align*}
\text{Condition 1} &\dots \text{No \textit{left recursion pattern} detected in the \textit{grammar}} \
\text{Condition 2} &\dots \forall A \in V_N (A \rightarrow \alpha_1 | \alpha_2 | \dots | \alpha_n) \Rightarrow First(\alpha_i) \cap First(\alpha_j) = \Phi ~ (i \ne j) \
\text{Condition 3} &\dots \forall A \in V_N (\epsilon \in First(A)) \Rightarrow First(A) \cap Follow(A) = \Phi
\end{align*}
$$
Now, let's prove them one by one!
Condition 1 ~ No left recursion pattern detected in the grammar
After having a glance of the given BNF, we could easily prove that:
$$
\forall A \in V_N (A \rightarrow B \wedge B \in V_N ) \Rightarrow A \ne B
$$
Which means that, there's no left recursion pattern detected in the grammar.
Condition 2
This could be easy, with the reference of BNF and first_follow_table
Condition 3
Just the same as Condition 2
Fibonacci Demo
Source code:
program fibonacci;
const index := 30;
var return,i,a;
procedure fib(a,x);
var sum;
begin
sum := 0;
if x<2 then
return := x
else
begin
call fib(a+1,x-1);
sum := sum+return;
call fib(a+1,x-2);
sum := sum+return;
return := sum
end
end
begin
i := 1;
a := 2;
while i<=index do
begin
call fib(a+1,i);
write(return);
i := i+1
end
end
Result:
- Console
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987
1597
2584
4181
6765
10946
17711
28657
46368
75025
121393
196418
317811
514229
832040
- PCode
PCode List:
======================================================================
0| JMP 0 39
1| JMP 0 4
2| STA 1 4
3| STA 2 3
4| INT 0 6
5| LIT 0 0
6| STO 0 5
7| LOD 0 4
8| LIT 0 2
9| OPR 0 10
10| JPC 0 14
11| LOD 0 4
12| STO 1 3
13| JMP 0 38
14| LOD 0 3
15| LIT 0 1
16| OPR 0 2
17| LOD 0 4
18| LIT 0 1
19| OPR 0 3
20| CAL 1 2
21| LOD 0 5
22| LOD 1 3
23| OPR 0 2
24| STO 0 5
25| LOD 0 3
26| LIT 0 1
27| OPR 0 2
28| LOD 0 4
29| LIT 0 2
30| OPR 0 3
31| CAL 1 2
32| LOD 0 5
33| LOD 1 3
34| OPR 0 2
35| STO 0 5
36| LOD 0 5
37| STO 1 3
38| OPR 0 0
39| INT 0 7
40| LIT 0 1
41| STO 0 4
42| LIT 0 2
43| STO 0 5
44| LOD 0 4
45| LIT 0 30
46| OPR 0 13
47| JPC 0 61
48| LOD 0 5
49| LIT 0 1
50| OPR 0 2
51| LOD 0 4
52| CAL 0 2
53| LOD 0 3
54| OPR 0 14
55| OPR 0 15
56| LOD 0 4
57| LIT 0 1
58| OPR 0 2
59| STO 0 4
60| JMP 0 44
61| OPR 0 0
======================================================================
- Symbol Table
Symbol Table:
======================================================================
name | type | val | level | addr | size | scope_list
======================================================================
index | const | 30 | 0 | 3 | 0 | ["#"]
return | var | 0 | 0 | 3 | 0 | ["#"]
i | var | 0 | 0 | 4 | 0 | ["#"]
a | var | 0 | 0 | 5 | 0 | ["#"]
fib | proc | 2 | 0 | 6 | 2 | ["#"]
a | var | 0 | 1 | 3 | 0 | ["#", "fib"]
x | var | 0 | 1 | 4 | 0 | ["#", "fib"]
sum | var | 0 | 1 | 5 | 0 | ["#", "fib"]
======================================================================
Error Handling Demos
As is mentioned follow, this implementation of pl/0 compiler has a complete error handling strategy, which means that it could find as many errors as possible in one run, instead of being halted by the first error.
Here are some simple demos:
Syntax Error (may coexists with Lexical Error
)
- src
program ;
var a, b, c;
begin
a 1;
b := ;
é : 3;
if 1 = 1 then
write(1
else
write 0);
write a + b + c;
wrçte(1)
end
- console
SyntaxError{ Line: 1, Col: 9 }
| ~~ Expected <id> field, but not found!
SyntaxError{ Line: 4, Col: 8 }
| ~~ Expected `:=`, but got `Integer(1)`
SyntaxError{ Line: 5, Col: 9 }
| ~~ Expected `<id>` / `<integer>` / `(<exp>)` field, but got an unmatchable token `;`
LexicalError{ Line: 6, Col: 3 }
| ~~ 'é' is not an ASCII character
LexicalError{ Line: 6, Col: 5 }
| ~~ ':' is an undefined sign, did you mean ':='?
SyntaxError{ Line: 6, Col: 7 }
| ~~ Expected `:=`, but got `Integer(3)`
SyntaxError{ Line: 6, Col: 7 }
| ~~ Expected <statement> field, but not found!
SyntaxError{ Line: 9, Col: 6 }
| ~~ Expected `)`, but got `Else`
SyntaxError{ Line: 10, Col: 11 }
| ~~ Expected `(`, but got `Integer(0)`
SyntaxError{ Line: 11, Col: 9 }
| ~~ Expected `(`, but got `Identifier("a")`
SyntaxError{ Line: 11, Col: 18 }
| ~~ Expected `)`, but got `;`
LexicalError{ Line: 12, Col: 5 }
| ~~ 'ç' is not an ASCII character
SyntaxError{ Line: 12, Col: 7 }
| ~~ Expected `:=`, but got `Identifier("te")`
SyntaxError{ Line: 12, Col: 7 }
| ~~ Expected <statement> field, but not found!
thread 'main' panicked at src/parser/mod.rs:149:7:
|> Errors above occurred (during `parsing`), compiling stopped ... <|
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Semantic Error
Duplicated Definition
- src
program MultiDef;
var a, a, a, a;
procedure proc();
begin
write(1)
end;
procedure proc();
begin
write(2)
end
begin
write(1)
end
- console
SemanticError{ Line: 3, Col: 8 }
| ~~ `a` is defined before
SemanticError{ Line: 3, Col: 11 }
| ~~ `a` is defined before
SemanticError{ Line: 3, Col: 14 }
| ~~ `a` is defined before
SemanticError{ Line: 10, Col: 14 }
| ~~ `proc` is defined before
thread 'main' panicked at src/translator/mod.rs:116:7:
attempt to subtract with overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Undefined
- src
program undef;
begin
a := 1;
b := 2;
write(c)
end
- console
SemanticError{ Line: 3, Col: 3 }
| ~~ `a` is undefined
SemanticError{ Line: 4, Col: 3 }
| ~~ `b` is undefined
SemanticError{ Line: 5, Col: 9 }
| ~~ `c` is undefined
thread 'main' panicked at src/translator/mod.rs:73:7:
|> Errors above occurred (during `translation/codegen`), compiling stopped ... <|
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
args_list.length
cannot match with definition(signature)
- src
program WrongArgsListLength;
var a;
procedure proc();
begin
write(1)
end;
procedure procc(x, t, z);
begin
write(1)
end
begin
call proc(1, 1, 1);
call procc(3)
end
- console
SemanticError{ Line: 16, Col: 11 }
| ~~ `proc` expects 0 args, but received 3
SemanticError{ Line: 17, Col: 12 }
| ~~ `procc` expects 3 args, but received 1
thread 'main' panicked at src/translator/mod.rs:73:7:
|> Errors above occurred (during `translation/codegen`), compiling stopped ... <|
Assign to const
/ procedure
- src
program AssignToConstProc;
const i := 1;
procedure proc();
begin
write(i)
end
begin
i := 16;
proc := 16
end
- console
SemanticError{ Line: 10, Col: 3 }
| ~~ `i` is not a variable
SemanticError{ Line: 11, Col: 6 }
| ~~ `proc` is not a variable
thread 'main' panicked at src/translator/mod.rs:73:7:
|> Errors above occurred (during `translation/codegen`), compiling stopped ... <|
Dependencies
~2.2–3MB
~59K SLoC