Assembly on MIPS and Linux syscalls

Edouard Tavinor
4 min readMar 29, 2017

Hi everybody! I recently got myself a Creator Ci20 development board. It’s a small board (about 10cm square) with a MIPS processor and enough ressources to run Linux well. I got it because I like interesting architectures, and MIPS was one of the first RISC instruction sets.

One problem is that I’ve found very few ressources online for assembly programming under Linux on the board. Most websites seem to show code for a MIPS emulator and the interrupts are different. Fortunately there’s an IRC channel for the Ci20 (on freenode and called #ci20) where you can ask for help, and a helpful person put me on the right track :)

So here’s the simplest example:

.text
.global __start
__start:
li $4, 3
li $2, 4001
syscall

If you save this file as 001.s, this can be run as follows:

as 001.s -o 001.o
ld 001.o -o 001
./001
echo $? #computer answers '3'

The basic structure is as follows:

.text tells the assembler that the processor will start computing here

.global __start defines a position in the program that you want to refer to as “__start” rather than as a number. It’s a way of naming places to make the life of the programmer easier. “__start” is a special name, because the linker will look for it and issue a warning if it doesn’t find it (it’s like toString() in Java in that certain methods in the standard library will automatically use it).

li $4 3 means “load immediate” and places the value 3 into register 4.

li $2, 4001 means “load immediate” and places the value 4001 into register 2.

syscall means “operating system, I need your help!”

Why 4001, why $4, why $2 etc.?

Well, now we’re getting to the operating system part. When the process reaches “syscall” the operating system jumps in to help the process out. The process needs a way to tell the operating system what help it needs. These cryptic numbers are the way. First Linux reads the number in register $2, in this case that’s 4001. 4001 means “This process has signalled that it wants to finish. It will return a value stored in register $4”. So Linux reads register $4 (which is a 3) and uses this number as the return value (on Linux, every process that ends has a “return value” which can be used for lots of useful things. It’s a number from 0 to 255. According to convention ‘0’ is used to mean “everything went well” and the other numbers are used to signify that something went wrong. In a terminal, ‘echo $?’ is used to print the last return value).

Aren’t these numbers a bit arbitrary?

Well, yes, of course they were, when they were decided upon. Now Linux expects certain numbers and will do something different if you use different numbers. A Linux kernel compiled for MIPS has a table with all the different values for register $2 and what they mean, and what the values in other registers mean when the interrupt happens.

So where do I find these numbers?

As Linus would say, use the source, Luke. Here are the two files you need: https://github.com/torvalds/linux/blob/master/arch/mips/include/uapi/asm/unistd.h and https://github.com/torvalds/linux/blob/master/include/linux/syscalls.h.

Well, now that everything’s clear …

Let’s look at the example above. We saved the number 4001 on register $2. You can find this (at time of writing) on line 24 of the first file.

#define __NR_Linux   4000
#define __NR_syscall (__NR_Linux + 0)
#define __NR_exit (__NR_Linux + 1)
#define __NR_fork (__NR_Linux + 2)

That’s it in the third line! __NR_exit. Now we look at the second file for something called sys_exit and on line 327 you find the following:

asmlinkage long sys_exit(int error_code);

This means, that the exit syscall takes one argument (error_code), and that argument should be in $4. Two arguments would mean registers $4 and $5. Three arguments would mean $4, $5 and $6 etc. MIPS32 (as found on the Ci20) has syscalls with up to 4 arguments. Other architectures can have more arguments.

Aren’t these numbers for registers a bit difficult to remember?

If you think that, you’re not the first person to do so. For this reason, the registers have two different names. $4 is also called a0, $5 is a1, $6 is a2 and $7 is a3. As well as this, $2 is v0 (and $3, if you use it) is v1.

This means that we could write our humble program as follows:

.text
.global __start
__start:
li $a0, 48
li $v0, 4001
syscall

Which looks a bit nicer :)

So let’s try another call. This time writing to the screen. The first file tells us we need interrupt code 4004 (__NR_write). The second file says the following in line 568–569

asmlinkage long sys_write(unsigned int fd, const char __user *buf,     size_t count);

The first argument, fd, is the file discriptor. Stdout is ‘0’. The second argument is the address of a character — the first character of the string. The third argument is the length of the string (in bytes). So we can write the following program:

.data
mystring: .asciiz "Hello World!\n"
.text
.global __start
__start:
li $a0, 0
la $a1, mystring
li $a2, 13
li $v0, 4004
syscall
li $a0, 3
li $v0, 4001
syscall

It just remains to assemble and link and run the program:

as 002.s -o 002.o
ld 002.o -o 002
./002 # prints "Hello World!"
echo $? # prints "3"

Can I use words for the syscall numbers?

Yes, you can. The easy way is to use the preprocessor from gcc. The code will then look like this:

#include <regdef.h>
#include <sys/syscall.h>
.data
mystring: .asciiz "Hello World!\n"
length: .word 13
.text
.global __start
__start:
li a0, 1
la a1, mystring
lw a2, length
li v0, SYS_write
syscall
li a0, 6
li v0, SYS_exit
syscall

To get gcc to preprocess a file, you have to name it <filename>.S and not <filename>.s (note the capital S).

Preprocessing, assembling, linking and running the file would then look like this:

gcc -S 003.S > 003.s
as 003.s -o 003.o
ld 003.o -o 003
./003 # prints "Hello World!"

There may be a way to do all this with just one gcc command, but I haven’t found it yet. When I try gcc 003.s the executable doesn’t call SYS_write.

--

--