6 stable releases
1.0.5 | Apr 25, 2024 |
---|---|
1.0.4 | Apr 24, 2024 |
1.0.1 | Apr 22, 2024 |
#45 in Programming languages
60KB
1K
SLoC
rgsm 🦀
An assembler for the Gheith ISA written in Rust.
Contents
Introduction
This is a sibling of the gasm
project.
Note that rgsm
is a contribution for the final project of Dr. Gheith's CS 429H Computer Architecture class. For students completing pipelining, it is recommended to use gasm
and dasm
, as they contain more pertinent features to completion of the project.
rgsm
is built and maintained as a part of the Gheith toolchain.
Getting Started
rgsm
is a simple Rust binary. It is released on Cargo (the official Rust package manager), so install it with the command below:
Installation
cargo install rgsm
Prerequisites
- Rust 2021
Writing Assembly
In this section, you will learn how to write a Gheith assembly program.
Comments
Comments can be placed anywhere in the assembly program with the //
prefix. The parser will ignore the rest of the line after this token is parsed.
Sections
An assembly program is split into two main "sections" -- .text
and .data
. In reality, a valid assembly program can have any number of these sections; however, every program is fundamentally comprised of these parts.
The .text
section is rather straightforward: it contains all instructions to execute.
.text
main:
print #97
print #98
print #99
// ...
end
The .data
section will contain data that will be placed in memory. The Gheith architecture is 16-bit, so each entry can represent
one of $2^{17} - 1$ values: { $0$, $1$, ... $2^{17} - 1$ }.
.data
a:
#97
b:
#98
c:
#99
// ...
Notice that each number is prefixed by an #
in this example and the one above. This will be explained more later, but each number
(and, in fact, most everything) must label itself as what it represents. In these cases, these numbers are immediate decimal values.
Label Definitions
Programs are really just data stored in memory. Thus, in assembly, if we want to reference a set of instructions (or really anything in our program), we must do so by using some unique identifier (a label).
.text
my_program_starts_here:
// ...
These labels are really just numbers -- locations in memory where a specific part of the program will reside. You can use them in any circumstance you would use another number like so:
// @0 -> Memory location word 0
.text
my_program_starts_here:
// ...
// This instruction jumps to the beginning of the program!
j my_program_starts_here
// ...
A label definition is declared like above, with a :
token afterwards to dictate that it is a definition. A label reference will omit this colon. Labels are unique, you cannot have two labels with the same name that reference different parts of the program. For example, the following code would be invalid:
// @12 -> Memory location word 12
label_1:
print #97 // <- label_1 references word 12
label_1:
print #98 // <- label_1 references word 13 (?)
// hopefully it is clear why this makes no sense
// ...
// just in case it isn't clear
j label_1 // where should this jump to?
// word 12 or 13?
Just remember that a label really just represents the memory location of the instruction/data that is directly beneath it.
Instructions
rgsm
supports an extended Gheith ISA. The specific instructions it supports can be found in a supporting document.
Here is the general form of an instruction: <INSTRUCTION NAME> [F1] [F2] [F3]
.
Instructions have up to 3 fields, whose existence and types are dictated by the ISA document. They can be one of three types: register, immediate, or label.
Register References
Take the following add
instruction as an example:
add r3, r4, r5
This operation adds the values of r4
and r5
and stores the result in r3
.
Register references are prefixed with the letter r
. There are 16 registers in the Gheith architecture, r0
-r15
. Their designations are:
Register Num. | Designation | Notes |
---|---|---|
0 | Zero/Print Register | Writing x to this register prints x |
1 | Return/Param Register | N/A |
2 | Two Register | This register stores the value $2$ |
7 | Jump Location Register | Overwritten by jumps to labels |
1-7 | Generic Caller-Saved | N/A |
8-15 | Generic Callee-Saved | N/A |
14 | Link Register | N/A |
15 | SP Register | Initialized to 0xFFFF |
Immediates
Immediates are decimal values with range depending on the instruction. They are prefixed with a #
. That's basically all there is to it.
Label References
Labels can be used any time an immediate is expected (although, it may not always semantically make sense to do so, like in the ldo
and sto
instructions). Labels can be referenced before or after their definitions, as they are preprocessed.
Pseudo-Instructions
Along with the officially-supported extended Gheith ISA instructions, rgsm
also supports a number of "pseudo-instructions", or more human-readable instructions that assemble to simple Gheith instructions. They are listed below:
Instruction Name | Field(s) | Functionality |
---|---|---|
print |
<ra: Register> | add r0, ra, r0 |
movlb |
<rt: Register>, <label: Label> | movl ra, label[7:0] , movh ra, label[15:8] |
j |
<rt: Register> | jz rt, r0 |
j |
<label: Label> | movl r7, label[7:0] , movh r7, label[15:8] , jz r7, r0 |
Entry Point
The Gheith architecture follows Von Neumann architecture principles of text sharing address space with data. The way that rgsm
organizes programs adheres to this.
The program text is placed at memory address 0, the entry point is the first instruction in the first .text
section -- this is the instruction that will be placed at address 0. If you look at the assembled output, you will see:
@0
// first instruction
// ... and so on ...
The @0
dictates that the following block is sequentially ordered starting at address 0.
The .text
sections are fused and placed first in memory, then the .data
sections are fused and placed last. Thus, the resulting machine code will be of the form:
@0
// first instruction
// second instruction
// ...
// end of `.text`
// first data entry
// second data entry
// ...
// end of `.data`
Skeleton
The skeleton of a valid assembly program is:
.data
// Place your relevant data here!
.text
// Place your instructions here!
end
Dependencies
~3–5MB
~85K SLoC