#assembly #byte #computer #virtual #virtual-machine #bit #compiler

vc_8bit

This project is a virtual computer that takes a vector of bytes and runs it as instructions. Also included is a complete assembler and compiler.

6 releases

new 0.1.5 Sep 16, 2024
0.1.4 Sep 16, 2024

#545 in Parser implementations

Download history 295/week @ 2024-09-09

295 downloads per month

MIT license

1MB
3K SLoC

Rust 2.5K SLoC // 0.0% comments C# 393 SLoC Visual Studio Solution 58 SLoC

Virtual Computer, Assembler, and Compiler

Cover

This project is a virtual 8 bit computer that takes a vector of bytes and runs it as instructions It includes the virtual machine, assembler, and compiler for custom assembly and high level language.

I created a blog post about this project and you can find that on Hashnode. It goes deeper into explaining the different parts of this project.

Add to Your Project

vc_8bit can be added to your Rust project with the terminal command:

cargo add vc_8bit

The code shown below assumes you are using the following modules:

use vc_8bit::{assembly, c_lang, vc_8bit::Computer};

Virtual Computer

The virtual computer works by instantiating the Computer and inserting the program into the ram as bytes.

let bytes = vec![Byte::from_string("11010100"), Byte::from_string("1001100")];
let mut computer: Computer = Computer::new();
computer.ram.insert_bytes(bytes);
computer.run();

The VC (Virtual Computer) is basically a big function that will take an array of bytes and run the instructions associated with the bytes. I have emulated components like a Binary Decoder, RAM, ALU, and CPU.

What the VC does is take the binary and figure out the instructions that go with it. It uses Binary Decoders in a match statement to decide what instruction it is. Every Byte is an instruction. Some examples of instructions are moving from memory to registers, adding the values from one register to another using the ALU. This is why assembly is basically binary. Maybe the binary sequence 01001110 is the MOV instruction. All an assembler does is convert the instructions like CPY, LDR, and ADD to there corrosponding bytes. It gets a little more complex than this. For this VC, the first 6 bits are for the instruction, and the last 2 bits are for the register. If this sounds interesting to you, I highly recommend watching Core Dumpped and his videos.

When working with the VC, remember the RAM has 256 byte limit because the VC is only an 8 bit computer compared to modern 64 bit computers.

Assembler

The assembler works by first assembling the code to binary. It will then turn the binary into an array of bytes.

let value = "MOV R0 50";

// assemble code
let contents = assembly::compile_assembly_to_binary(value.to_string());
let bytes = assembly::string_to_bytes(contents.as_str());

// run on VC
let mut computer: Computer = Computer::new();
computer.ram.insert_bytes(bytes);
computer.run();

The assembler will go line by line the code into binary. I created a custom assembly language to work with the VC.

  • HALT: Stops the program
  • STR R0 #0000000: Stores the value in the register to the address in memory
  • LDR R0 #0000000: Loads the value in the memory address to the register
  • MOV R0 #0000000: Moves a byte value into a register
  • CPY R0 R1: Copys the value of 1 register to another
  • SHR R0 #0000000: Shifts a register value by the left many times the number in the byte is
  • SHL R0 #0000000: Shifts a register value by the right however many times the number in the byte is
  • OUT R0: Outputs the value in the register to the console as a byte value
  • MSG R0: Outputs the value in the register to the console as an ASCII character
  • INC R0: Increments the value in the register
  • DEC R0: Decrements the value in the register
  • JMP #0000000: Moves the RAM index
  • JMP_NEG #0000000: Moves the RAM index if the ALU Negative flag is on
  • JMP_ZRO #0000000: Moves the RAM index if the ALU Zero flag is on
  • JMP_ABV #0000000: Moves the RAM index if neither the ALU negative flag or Zero flag is on
  • CMP_NEG R0 #0000000: Moves 11111111 to the register if the ALU negative flag is on. If not, it moves 00000000 to the register
  • CMP_ZRO R0 #0000000: Moves 11111111 to the register if the ALU zero flag is on. If not, it moves 00000000 to the register
  • CMP_ABV R0 #0000000: Moves 11111111 to the register if neither the ALU negative flag or Zero flag is on. If not, it moves 00000000 to the register
  • ADD R0 R1: Adds the byte value from registers 0 and 1 and moves the result to the first register
  • SUB R0 R1: Subtracts the byte value from registers 0 and 1 and moves the result to the first register
  • MUL R0 R1: Multiplies the byte value from registers 0 and 1 and moves the result to the first register
  • DIV R0 R1: Divides the byte value from registers 0 and 1 and moves the result to the first register
  • AND R0 R1: Does an and operation on the bytes from registers 0 and 1 and moves the result to the first register
  • OR R0 R1: Does an or operation on the bytes from registers 0 and 1 and moves the result to the first register
  • XOR R0 R1: Does an exclusive or operation on the bytes from registers 0 and 1 and moves the result to the first register
  • NOT R0: Does a not operation on the byte in register 1 and moves result to it
  • RPRT R0 #0000000: Reads the value in the port at the address and writes the value to register 0. The address needs to be 0 through 7.
  • WPRT R0 #0000000: Writes the value in register 0 to the port at the address. The address needs to be 0 through 7.

The assembler will identify integers, bytes, and hexadecimals:

MOV R0 5 ; moves 5 into R0
MOV R1 0x3A ; moves the hexadecimal 3A (integer 58) to R1
MOV R2 #00110100 ; moves the byte value 00110100 (integer 52) to R2

There is also the %ASSIGN feature which can be used as constants inside the assembly

%ASSIGN VARIABLE_ADDRESS #11110100

MOV R0 37
STR R0 VARIABLE_ADDRESS

Compiler

The compiler works by compiling the code into assembly.

let value = "print('a');";

// compile code
let asm = c_lang::compile(value.to_string());

// assemble code
let contents = assembly::compile_assembly_to_binary(&asm);
let bytes = assembly::string_to_bytes(contents.as_str());

// run on VC
let mut computer: Computer = Computer::new();
computer.ram.insert_bytes(bytes);
computer.run();

Language

The language looks like C with a few key distinctions.

There are 3 types:

  • uint8 an unsigned 8 bit integer. Any whole number between 0 and 255.
  • bool a true of false value. Is either 00000000 or 11111111 in binary
  • char a character value. Uses the ASCII codes to convert to binary

There are 5 functions:

  • to_char(uint8) turns a number into a character. This works for numbers 0 - 9. Any number above that will get the ASCII character of that number plus 48. This is because 48 is the ASCII character for 0. Here is an example of th function: char c = to_char(7)
  • out(any) will output the byte value to the console of any type. For example, the value 5 will output 00000101.
  • print(char) will output the character to the console.
  • write_port(uint8, any) will write the byte value to the port number 0 - 7. The address needs to be a constant. For example write_port(2, 'a') is fine but write_port(99 - 3, 'a') and write_port(variable, 'a') will not work.
  • read_port(uint8) will read the byte value from the port address 0 - 7. The address needs to be a constant. For example read_port(2) is fine but read_port(99 - 3) and read_port(variable) will not work.

There is only the if and while statements. The code inside of the statement needs to be seperated by commas and not semicolons. The last line inside the statement can not have a comma. The ending bracket of the statement needs to end with a semicolon. Here is what it looks like:

if (true && 0 == 0) {
    char c = 'a',
    print(c)
};

All operators work except for the += type of operators. Just use A = A + B instead. For the << and >> operators, the right operand needs to be constant. For example, 'a' >> 3 works but 'a' >> 3 + 1 or 'a' >> variable do not work.

  • ++, -- increment and decrement. Works with uint8.
  • +, -, *, / arithmatic operators. Works with uint8.
  • &&, ||, ==, != logic operators. Works with bool.
  • <<, >> bit shift operators. Shifting amount needs to be constant. Works with any type.
  • &, |, ^, ! and, or, excluseive or, and not operators. Works with any type.
  • = set variable.

There is no way to define functions in this language. Also remember there are only 256 bytes in memory to work with in the VC. This means your program must be less than 255 bytes, and any variables that you use will take up one of those bytes.

Example

This simple program took me 27 bytes to write it in assembly. The compiler was able to use 38 bytes. It's not the smartest compiler it works.

uint8 a = 0;

while (a < 5) {
    a = a + 1,
    char c = to_char(a),
    print(c)
}

The assembly output for this code is as follows:

MOV R0 #00000000
STR R0 #11111110 ; store created variable ; BYTE ADDRESS 4
LDR R0 #11111110 ; load variable ; value left
MOV R1 #00000101 ; value right
SUB R0 R1
CMP_NEG R0 ; compare
CPY R3 R0 ; copy value right ; get value for statement
MOV R2 0 ; set R2 to 0
SUB R2 R3 ; check if statement is true
JMP_ZRO 37 ; jump if false
LDR R0 #11111110 ; load variable ; value left
MOV R1 #00000001 ; value right
ADD R0 R1 ; math
STR R0 #11111110 ; store variable
LDR R3 #11111110 ; load variable
MOV R2 48 ; 48 is ascii '0'
ADD R3 R2 ; get ascii
CPY R0 R3 ; move value to correct register
STR R0 #11111101 ; store created variable
LDR R2 #11111101 ; load variable
MSG R2 ; print value
JMP 4 ; jump back to start ; BYTE ADDRESS 38
HALT

The binary code that was assembled from this:

110010000000000011000000111111101100010011111110110010010000010100010001111100001100111100000000110010100000000000011011111010100010010111000100111111101100100100000001000000011100000011111110110001111111111011001010001100000000111011001100110000001100000011111101110001101111110111011110111010000000010011111111

Dependencies

~2–3MB
~53K SLoC