We're going to learn assembly.
Why?
Bad reasons to learn assembly:
Some good reasons to learn assembly:
…
We're learning x86-64 assembly: the assembly code for x86-64 processors, so it will only run there.
There will be details in our code that are Linux-specific [later topic: “calling convention”], so it will only run there.
We're writing x86-64 assembly with the AT&T syntax, not the Intel syntax. The difference is superficial but annoying. The Bryant and O'Hallaron book, and GNU assembler both use AT&T syntax.
But many other tutorials and resources use Intel syntax. Briefly: if you see a lot of punctuation (%
and $
everywhere), that's AT&T syntax. [later topic: “Assembly Syntax”]
We will approach the idea of assembly language from both sides and meet in the middle.
.s
files generated by gcc -S
: they are some kind of very low-level description of our logic..o
files goes in there (and gets decoded and then drives the processor). [This will be the next slide deck.]The assembler is translating the assembly code into machine code. Machine code consists of instructions, which are what the processor's instruction register needs.
This translation is much easier than the compiler's job of translating C code (or another language) to assembly.
Consider this instruction that means move (copy) the contents of register
.rdi
to rax
mov %rdi, %rax
It gets translated to these three bytes, represented in binary (base 2) or hexadecimal (base 16):
01001000 10001001 11111000
48 89 F8
mov %rdi, %rax
48 89 F8
[I'm 90% sure…] The 89 byte encodes move (
. The F8 byte encodes mov
) a register to another register
. The other byte is specifying the register sizes (64-bit each) and other details.rdi
to rax
There's a little more to the assembler's job, but the basics are just: translate the instructions (written by the programmer or compiler) to bytes.
Those bytes can be sent to the processor's instruction register, and the processor does the thing.
Now, all we have to do is write some assembly code to see it work…
I'm going to write a function add10
in assembly and call it from C. We'll need a C header file so the C code knows about the argument and return types. In add10.h
:
#include <stdint.h> int64_t add10(int64_t n);
The goal: a function written in assembly that will return n+10
.
And our C program:
#include <stdint.h> #include <stdio.h> #include "add10.h" int main() { int64_t n = 1234; int64_t m = add10(n); printf("%ld + 10 = %ld\n", n, m); }
Now, if we can get a .o
file containing an appropriate add10
function, things should work.
I'm going create add10.S
with the assembly code. Files named .S
are run through the C preprocessor, but .s
files are not (like .c
vs .i
).
I'm going to adopt the convention that .S
files contain hand-written assembly (as opposed to compiler-generated assembly).
There will be a few lines above this, but this is the add10
function
:
add10: mov %rdi, %rax add $10, %rax ret
Here,
is a label: a marker for a location in memory. The memory location of the next instruction (the add10
mov
) can be referred to as add10
.
mov %rdi, %rax
This is one instruction: the mov
operation. It has two operands, %rdi
and %rax
. The destination is always the last operand (in AT&T syntax).
The mov
is the move instruction that copies the value from its source operand (%rdi
here) to the destination (%rax
).
mov %rdi, %rax
Both %rdi
and %rax
are x86-64 registers. A register is a storage location inside the processor: large enough to hold one value (64 bits in this case), fast. There are only a few registers to work with.
This instruction copies the contents of %rdi
to %rax
.
add $10, %rax
The add
instruction adds one integer to another. As before, the last operand is the destination: %rax
.
The first operand is the value we're adding to %rax
. In this case, a literal integer 10. In C syntax, the operation we're doing is
.rax += 10
Note: register names are prefixed with %
and literal numeric values prefixed with $
.
Then, return from the function.
ret
We don't yet know why or how, but if this works (and it will), we somehow were given the (single) argument to the function in the %rdi
register, and had to put the return value in %rax
before returning.
There is a little more housekeeping to do at the start of the .S
file. Complete contents:
.section .note.GNU-stack, "" .global add10 .text add10: mov %rdi, %rax add $10, %rax ret
The .global
: export this label so the linker can see it, and .text
marks the start of a code section [later topic: “assembly code sections”].
We already know how to compile the .c
code to an object file. We have seen how to invoke the GNU assembler to create an object file: here the code is coming from us, not the compiler (and we'll add --warn
).
gcc -Wall -c add10_test.c # compile to add10_test.o as --warn add10.S -o add10.o # assemble to add10.o gcc add10.o add10_test.o -o add10_test # link
Now we have an actual executable that we can run and get the output we expect:
./add10_test
1234 + 10 = 1244
We created a function in assembly, assembled it, and called it from C. 🎉
Summary:
.S
files for hand-written assembly and .s
for compiler-generated assembly.foo:
is a label for a memory location.%
indicates a register.$
indicates a literal value.%rdi
; return value in %rax
.Things unsaid:
add10
as a function. In assembler, it's just a memory location. We need to ret
when we're done, but the compiler can't check that either.An alternate implementation of add10
:
add10: add $10, %rdi mov %rdi, %rax ret
This code does the calculation in %rdi
and copies the result to %rax
(instead of copying the argument and calculating in %rax
). Both implementations are functionally equivalent.
Assembly functions
can take arguments and returned results. How do the arguments come in, and the return value go out?
The way we implement arguments/return is determined by the calling convention, specifically the System V AMD64 ABI
that's used in Linux and MacOS (but not Windows).
A calling convention specifies how functions receive their arguments, give their return value, deal with local variables, etc.
Our calling convention specifies things like (incomplete list):
%rdi
.%rsi
.%rdx
.%rax
.%rbp
must be unchanged when a function returns: the value there are the start of the function must be there at the end.%rcx
may be modified during a function call: it may be different when the function returns.Having a calling convention lets different compilers/tools interoperate with each other. e.g. we were able to write a function in assembly and call it from C code because we followed the same calling convention as the C compiler.
In theory, we could choose any combination of registers for arguments and return values. In practice, we need to do it the same as everybody else.
Integer arguments to functions are passed in these registers (in this order): %rdi
, %rsi
, %rdx
, %rcx
, %r8
. and %r9
.
Floating point arguments are passed in the SSE registers: %xmm0
to %xmm7
.
If there are more arguments than that, they go on the stack.
[later topics: “the registers”, “floating point”, “the stack”]
Before a function returns, any integer return value must be put in %rax
; floating point return in %xmm0
.
A more interesting aspect of the calling convention: preserved registers.
There is no equivalent of local variables
for the registers. If we call a function, that function uses the same registers as our code. How can our code maintain any data across a function call?
Some registers must be preserved
as part of a function call. That is, a function must guarantee that the register value will be unchanged after it's done.
Functions are allowed to use those registers, but the original values must be restored if they do.
These registers are preserved across function calls: %rbx
, %rsp
, %r12
, %r13
, %r14
, %r15
.
Other than those, you have to assume that any function call will destroy values you have in registers.
You may see the terminology caller-saved and callee-saved for these two categories of registers. Those are annoyingly similar and hard to parse.
I will say preserved and not preserved with this StackOverflow answer explaining why.
We have been using registers and have some idea what they are: a small number of very fast storage locations in the processor.
The x86-64 registers are a mess, mostly for historical reasons. A lot of them have names because of ways they were originally intended to be used, but they're just general purpose places to put 64-bit values.
These are not call-preserved:
Register | Name | Use |
---|---|---|
%rax | accumulator | return * |
%rcx | counter | arg4 * |
%rdx | data | arg3 * |
%rsi | source | arg2 * |
%rdi | destination | arg1 * |
%r8 | arg5 * | |
%r9 | arg6 * | |
%r10 | anything | |
%r11 | anything |
These are call-preserved:
Register | Name | Use |
---|---|---|
%rbx | base | anything |
%r12 | anything | |
%r13 | anything | |
%r14 | anything | |
%r15 | anything |
If you use these in a function, you must store/restore their values.
There are also several registers that have specific purposes and hold values interpreted in specific ways. For now, leave these alone.
Register | Name |
---|---|
%rsp | stack pointer |
%rbp | base pointer |
%rip | instruction pointer |
%rip
can only be manipulated by jump/branch instructions. [later topic: “branching”]
There are also separate registers used when working with floating point values: %xmm0
to %xmm15
. None of them are call-preserved.
There are also names for smaller fragments of each register. For example, %rax
is a 64-bit register but….
%eax
refers to the lower 32-bits of %rax
,%ax
refers to the lower 16-bits of %rax
,%al
refers to the lower 8-bits of %rax
,%ah
refers to the next 8-bits of %rax
.Or more visually, there's a single 64-bit register in the processor that you can refer to in these pieces:
Here, all of %rax
, %eax
, %ax
, %al
hold the integer 1.
Even though %rdx
and %dl
look like different register names, writing to one changes the value in the other (but %rdi
is a completely different register, even though it has a
in the name).d
Mostly because I want to write some more assembly, let's work through an example of preserving a register inside a function.
I'm going to write a function in assembly that uses %rbx
as a temporary variable: %rbx
is call-preserved, so I have to restore it to its original value before I return.
I need to store the original value of %rbx
somewhere. I'm going to store it on the stack.
Short user's manual for the stack before we really talk about it: push
to put a value onto the stack, and pop
to get it back later. You must pop
values in the opposite order to push
ing them.
And I'm going to write assembly that can run as the main program, not called from C. I'm going to use helpers.c
that contains some useful functions.
gcc -c helpers.c -o helpers.o as --warn preserve.S -o preserve.o ld helpers.o preserve.o -o preserve ./preserve
The main
will call my (two-argument) function, print its return value, and exit (i.e. stop the program).
.section .note.GNU-stack,"" .global _start .text mult_and_sub: # TODO _start: mov $5, %rdi mov $7, %rsi call mult_and_sub mov %rax, %rdi call print_int64 # provided in helpers.c mov $0, %rdi call syscall_exit # provided in helpers.c
Since we're using the call-preserved %rbx
, we need to store it temporarily on the stack, so our function is going to look like:
mult_and_sub: push %rbx # put the outside code's %rbx on the stack ⋮ pop %rbx # restore original %rbx from the stack ret
Now the actual function. I want it equivalent to this C:
int64_t mult_and_sub(int64_t a, int64_t b) { return (a*b) + (a&b) + (a-b) }
The whole function to get there:
mult_and_sub: # return (a*b) + (a&b) + (a-b) push %rbx # a*b mov %rdi, %rax # start the sum in %rax imul %rsi, %rax # ... with a*b # ... + a&b mov %rdi, %rbx # use %rbx to calculate a&b and %rsi, %rbx add %rbx, %rax # add it to %rax # ... + a-b sub %rsi, %rdi # don't need %rdi again, so can modify add %rdi, %rax # add a-b to %rax pop %rbx ret
That logic rewritten in C would be:
int64_t mult_and_sub(int64_t a, int64_t b) { int64_t rdi, rsi, rax, rbx; rdi = a; // because of the calling convention rsi = b; // because of the calling convention rax = rdi; rax *= rsi; rbx = rdi; rbx &= rsi; rax += rbx; rdi -= rsi; rax += rdi; return rax; // because of the calling convention }
The register names kind of all look the same: I tend to include a comment like this to remind myself which register is doing what.
# Return (a*b) + (a&b) + (a-b) # %rdi = a # %rsi = b # %rbx = tmp # %rax = result
If I was writing this code myself (not as a lecture example), I would still comment almost as in the file. Assembly is hard to read, and comments shouldn't be considered optional.
The point of the register preserving (with push
and pop
here) is that any calling code must be able to rely on %rbx
(and other preserved registers) being unchanged after calling any function.
mov $1234, %rbx mov $5, %rdi mov $7, %rsi call mult_and_sub mov %rax, %rdi call print_int64 # prints result of mult_and_sub mov %rbx, %rdi call print_int64 # must print 1234
Assembly feels very manual
: you're in charge of every single step, and every single detail. It's your job to keep track of number of arguments, types, preserving the right registers, etc.
Also note, these instructions (add
, sub
, imul
) do integer arithmetic. We don't have any tools to work with floating point values (yet).