-
Notifications
You must be signed in to change notification settings - Fork 241
Fundamentals of RISC V Assembly
The first tutorial doesn't explain enough of the core concepts needed to move on to more complicated programs. This page will dig deeper into RISC-V, and explain the fundamentals required for you to write your own "hello world" program.
Registers are the variables of assembly. However, unlike in high level languages, we have a limited number (32) of registers. Registers are used to store information that will be needed quickly; long-lasting values generally are stored in RAM.
In RISC-V, we can only directly manipulate registers. If you want to modify something in RAM you must load it into a register, modify it, and then write the value back to RAM.
There are two ways to refer to each register: its proper name x0
, x1
, ..., x31
and its ABI name zero
, ra
, ..., t6
. The ABI names refer to things like "return address" and "temporary 6". For more information about that, check out the calling convention page. The only special register is x0
/zero
, which always has the value 0 and ignores being written to.
In practice, you don't need to know about the proper names, you will be using the ABIs, and mostly the seven temporary registers t0
to t6
.
If you find yourself hopelessly confused which register you have assigned to do what, you do have the ability to use .eqv
directive to rename them. It is best to keep the original name as part of the new one:
.eqv t0_cats t0
.eqv t1_dogs t1
.eqv t2_mice t2
li t0_cats, 10
li t1_dogs, 1
li t2_mice, 100
We'll explain the .eqv
directive in greater detail below. RARS will still understand the default names. Note there is currently no way to remove these definitions again.
All code that is executed in a RISC-V processor comes in the form of instructions. Things like adding/multiplying two numbers, loading a value from RAM into a register, and issuing a system call are instructions. There is no fundamental concept of a function, loop, or if statement; all of those concepts are made using instructions.
For example, the following executes the addi
instruction 10 times.
li t0, 10 # t0 = 10
li t1, 0 # t1 = 0
loop:
addi t1, t1, 1 # add t1 += 1
bne t0, t1, loop # Branch to the loop label if t0 == t1
A full list of instructions supported by RARS is available here.
Nearly every instruction involves a register, but some also use a number or label. This is often referred to as an immediate. Immediates are stored as part of the instruction in the form of a number. When assembling the program, the label's address - the instruction's address is used as the immediate.
The use of labels like loop
in the small program above is not strictly necessary. You could write the instructions directly doing that, but it would be very time consuming to write.
Note: RARS doesn't actually support hardcoding offset immediates, but it would work in GCC.
RISC-V is a RISC instruction set, and therefore tries to minimize instructions. So there isn't an instruction dedicated to loading a value into a register, or to move a value from one register to another. However, many instructions that might be nice to have can be made using other instructions. We call these instructions (that aren't in the instruction set, but can be directly replace with real instructions) psuedo-instructions.
For example, li t0, 10
can be translated into addi t0, zero, 10
.
Assembler directives are a way of telling the assembler what to put in the executable besides instructions. It allows you to set up memory for when your program starts running. All directives start with a .
like .data
or .macro
.
I'll explain the most common ones, but a full list of the directives is available here.
It is considered bad form to use actual numbers in the code, because any change to these "magic numbers" requires going through the program line-by-line to find them. Instead, we can use the .eqv
directive to define human-readable constants at the beginning.
.eqv N_CATS 10
li t0, N_CATS
Any change can now be handled in one place.
Note: Other assemblers use .equ
instead of .eqv
.
Easily the most important directives are .text
and .data
; these tell the assembler where to store the instructions or data that follow. The .text
section stores instructions and the .data
section stores (global) variables we will be using and starts with them initialized.
Because code and data become bytes before a program starts, we need to tell the assembler in what section to put code. Both directives can also be used to tell the assembler exactly where to put data/code in the section. This can be done by appending an address after the directive like .text 0x400010
, but this is generally not necessary.
There are also subtle differences between the .data
and .text
sections that can be disabled in RARS by enabling self-modifying code. More about that can be found in a later explanation.
Generally, there are four types of data that can be set up after a .data
directive: integers, floating point numbers, strings, and allocated space for the program to fill later.
Integers can be initialized with .byte
, .half
, and .word
for 1 byte, 2 byte, and 4 byte integers, respectively. They can also store multiple comma-separated values.
For example:
.data
big:
.word 0x1000000
medium:
.half 0x1000
smalls:
.byte 0x10, 90, -10
Floating point numbers can be initialized with .float
. The number is stored as the closest IEEE 754 floating point number. NaN
and Inf
can be used to represent "Not a Number" and Infinity, respectively. Like with integers, it can store multiple values.
For example:
.float 90, NaN, -Inf, -1.337, 1e10
There are two flavors of strings: those automatically ending with a NULL (0) byte, and those that are not. .asciz
or .string
is used for the former, and .ascii
is used for the latter. Despite what the names would imply, the directives actually store the strings encoded as UTF-8 characters. However, because UTF-8 is backward-compatible with ASCII, the correct characters/strings are stored.
For example, the following all put the same string into memory:
ASCIZ:
.asciz "Hello"
ASCII:
.ascii "Hello\0"
BYTES:
.byte 0x48, 0x65, 0x6C, 0x6C, 0x6F, 0
.
Finally, the last way to set up memory is to allocate space only filled with zeroes. The main way to do this is using the .space
directive. .space 10
will allocate ten bytes of space. The other way is to use .extern
; it allows multiple files to define the same variable without duplicating it. This would generally be used in a file that is included using .include
so that multiple files including the same file don't duplicate it in memory.
.extern LABEL, 90 # allocates 90 bytes that can be referenced by LABEL