The commands we have been typing have done various parts of the broader compilation
process:
gcc -c
compiles C code to object code.as
assembles assembly code to object code.ld
links object files to create an executable.The long list of command line flags only apply to the compiler, not the other steps: they are about how C code is turned into assembly and aren't relevant to assembling or linking.
In labs 2 and 3, we will see the -g
switch to enable debugging information. That can be done to either the assembler or compiler.
gcc -Wall -Wpedantic -std=c17 -march=haswell -O1 -g -c aaa.c as --warn -g bbb.S -o bbb.o ld aaa.o bbb.o -o program
We also sometimes used gcc
as a linker:
gcc aaa.o bbb.o -o program
The difference between gcc
(as a linker) and ld
: gcc
knows how to set up the world so C code works (e.g. standard library linked in and initialized when the program starts), then it calls ld
for you. Also, gcc
automatically includes the C runtime, so your program starts at main
instead of _start
.
But gcc
also obviously knows how to assemble code: it's doing it with the assembly it generates from your C code all the time. It can also be given an assembly file and it knows how to handle it (i.e. it runs as
for you).
If we rename _start
to main
in our assembly code, we can use gcc
(or clang
) for everything:
gcc -Wall -Wpedantic -std=c17 -march=haswell -O1 -g -c aaa.c gcc -Wall -g -c bbb.S # note: -c to assemble to a .o gcc aaa.o bbb.o -o program
You can also give gcc
all of the input files at once, and have it make the executable in one step:
gcc -Wall -Wpedantic -std=c17 -march=haswell -O1 \ -g aaa.c bbb.S -o program # note: no -c
…if you want the same options applied to all files, and promise you understand which command line switches apply to which steps.
e.g. I might want -g
debugging info on some files but not others: a single command can't do that.
For lab 1, use the distinct assembling, compiling, linking commands as appropriate: convince us (and yourself) that you know the difference between the steps.
After that: as long as your code works as specified, it's okay.
Back to assembly…
We have one pair of tools that let us do anything besides executing the next instruction: call
and ret
.
Otherwise, every instruction has been followed by the next one in the assembly code.
Think about the control flow tools you have in C (and other imperative programming languages): if/else
, for
, while
, etc.
We need to be able to logic like that before we get much farther.
Assembly doesn't have concepts like blocks
of code (what's in {…}
in C).
Everything other than continue to the next instruction
in assembly is done with one of the jump instructions. *
We can jump to a specific memory location. The instruction there will be the next one executed.
In other words, jumping sets the instruction pointer to the given memory address. Then the next instruction (loaded into the instruction register and executed) will be the one from that location in memory.
The jmp
instruction can be used to jump to a specific location (indicated by a label referring to that memory location).
jmp over_there
This assembly creates a loop_forever
function that starts a counter (in the %rcx
register) at zero, and adds one to it in an infinite loop:
loop_forever: mov $0, %rcx loop_top: add $1, %rcx jmp loop_top
Remember: a label just gives a name to a memory location. Here, we imagine loop_forever
marks the start of a function, and loop_top
indicates the top of our infinite loop. To the assembler, they're just labels of memory addresses.
loop_forever: mov $0, %rcx loop_top: add $1, %rcx jmp loop_top
If we execute this, the mov
happens first. Then an add
.
Then the jmp
sends execution back to the add
. So, another add
, then jmp
, add
, jmp
, add
, …
The jmp
instruction is fine, but an infinite loop isn't what we usually want. We need conditions.
In an if/else
, there's a condition that determines if we want to jump to the if
or else
body.
In any loop, there's a condition that determines if we want to jump back up to the top of the loop (to execute it again) or not (and continue on to the code below).
Conditions can be implemented with conditional branch instructions. They effectively work in two steps: first, do a comparision. Then jump only if the comparision had a certain result (else, continue to the next instruction).
The challenge: write a function max
that will return the larger of two signed integers. The appropriate .h
:
int64_t max(int64_t a, int64_t b);
In C, that would be something like this:
int64_t c_max(int64_t a, int64_t b) { if ( a >= b ) { return a; } else { return b; } }
We need to be able to implement the >=
comparison, and the if/else
.
int64_t max(int64_t a, int64_t b);
As always, we will find our first argument in the %rdi
register and second in %rsi
.
As always, the result needs to be put in %rax
before we ret
.
To implement max
, we need to copy either %rsi
or %rdi
into %rax
. That logic can be implemented like this (once we figure out the conditional jump):
max: ??? # if %rdi < %rsi jump to "a_is_less" mov %rdi, %rax # return a ret a_is_less: mov %rsi, %rax # return b ret
First step: the comparison. The cmp
instruction will compare two integers and set some flags
to record the results [later topic: “status flags”].
We can start our max
function as:
cmp %rsi, %rdi
Remember: %rdi
= a
and %rsi
= b
, so this is compare b with a
.
Now that we have compared a
and b
, we can look at the results and do a conditional jump: only jump if we found out that a<b
.
That can be done with:
cmp %rsi, %rdi jl a_is_less
The jl
is jump if less
. Implicit else: don't jump, and just go to the next instruction.
All together, a full return max signed integer
function:
max: cmp %rsi, %rdi jl a_is_less mov %rdi, %rax ret a_is_less: mov %rsi, %rax ret
Note: the cmp
instruction's arguments are (in my mind at least), backwards. In this code %rdi
= a
and %rsi
= b
then we…
max: cmp %rsi, %rdi jl a_is_less ... a_is_less: ...
Compare b
and a
and jump if less
means a<b
.
After doing a
(or some other operation that sets flags), there are several conditional jump instructions that will let us implement various comparisions.cmp y, x
Each jump is if condition, jump to label, else continue to the next instruction
.
After doing
some jumps:cmp y, x
Instr. | Jump if… | Condition |
---|---|---|
ja | above | x > y (unsigned) |
jae | above or equal | x >= y (unsigned) |
jb | below | x < y (unsigned) |
jbe | below or equal | x <= y (unsigned) |
jg | greater | x > y (signed) |
jge | greater or equal | x >= y (signed) |
jl | less | x < y (signed) |
jle | less or equal | x <= y (signed) |
je | equal | x == y |
jne | not equal | x != y |
Another useful instruction: the
instruction can be used to check how test x, x
x
compares to zero…
Note: two operands, but we're setting them both to the same thing, like
.test %rdi, %rdi
After doing
some jumps:test x, x
Instr. | Jump if… | Condition |
---|---|---|
jg | positive | x > 0 (signed) |
jge | positive or zero | x >= 0 (signed) |
jl | negative | x < 0 (signed) |
jle | negative or zero | x <= 0 (signed) |
je | zero | x == 0 |
jne | non-zero | x != 0 |
With conditional branches, we can implement the control structures we know from C and other languages (if we could implement a complex-enough condition).
For example, we have seen (part of) an if/else
:
cmp ??? # evaluate the condition j?? else_code # jump conditionally to the "else" ... # do the "if" jmp after_if_else else_code: ... # do the "else" after_if_else: ... # code that follows the if/else
Suppose we want to implement while
-loop-like logic in assembly:
void loop_example() { uint64_t c = 0; while ( c < 3 ) { c += 1; printf("%ld\n", c); } return; }
We expect this to print the loop counter three times: the loop exits when c >= 3
.
We have the tools we need for that in assembly: use a register to count (I choose %rcx
). Compare the counter to 3. Conditionally exit the loop.
This code implements the same logic: counter to 0; compare counter to 3; exit the loop if ≥; increment.
loop_example: mov $0, %rcx loop_top: cmp $3, %rcx # check the loop condition jge loop_end add $1, %rcx # the loop "body"... # print... mov %rcx, %rdi push %rcx # not call-preserved: store it on the stack call print_int64 pop %rcx # restore %rcx from the stack jmp loop_top # back up to test the "while" condition loop_end: ret
Hopefully you're convinced any of the control structures you know can be duplicated in assembly.
Textbooks often have sections how to translate C structure X to assembly
. My advice: ignore those.
Just use the tools you have in assembly (mostly conditional jumps) to implement the logic you need. Don't try to translate C thoughts to assembly.
How do the conditional branches work? How do they know
what happened in the last instruction?
There must be some information passed from the cmp
to the jge
instruction here.
cmp $3, %rcx jge loop_end
The cmp
and test
instructions (also many others) set status flags or status bits or the EFLAGS register as a side-effect of their execution. Each flag is a single bit representing something about the result of an instruction.
e.g. the instruction
will set the zero flag (cmp %rax, %rbx
ZF
) to 1 if the two values were equal, and to 0 if not.
Then if you do the je
(jump if equal) instruction right after, it will check the ZF
bit and branch if it's 1. Or jne
(jump if not equal) will branch if it's 0.
Actually cmp
just subtracts its two operands, sets the status flags, and throws away the result of the subtraction: the zero
flag means equal
here because if subtracting the two gives zero, they were equal.
There is actually an instruction jz
(jump if zero) that is a synonym for je
(jump if equal).
Every instruction sets the status flags as appropriate to what they're doing.
For example, every time we have added, the carry flag CF
was set if there was a carry-out, i.e. an unsigned integer overflow. We can use jc
(jump if carry) or jnc
(jump if no carry) to detect this.
An example using one-byte registers (because the values are small enough to understand easily). The largest one-byte unsigned integer is 255, so this add
carries:
status_test_1: mov $200, %dl # %dl == first byte of %rdx add $57, %dl jc it_carried mov $0, %rax ret it_carried: mov $1, %rax ret
… and we expect this function to return 1.
Or, we can use the sign flag SF
to check if the last result was negative, where we imagine our integers as signed.
Here, the result is not negative, so we expect SF
to be zero, so the jump will not happen and we return 0.
status_test_2: mov $10, %r8 sub $8, %r8 js its_negative mov $0, %rax ret its_negative: mov $1, %rax ret
So, we can set status flags…
cmp
or test
; or… and then we can do a conditional branch based on whatever logic we're trying to implement.
See our x86-64 cheat sheet for details of the conditional branches and the flags they examine.
In case you haven't see a stack data structure…
It's a list/array kind of data structure, but there are two operations that can be done to a stack: push and pop.
topof the stack (
aboveany other values).
Or visually, push 5 values then pop 5 values:
The important part: popping
removes values from the stack in the reverse of the order they were added.
If you prefer code: stack-ds.cpp
, which implements (an extremely simple) stack in C++.
We have seen the push
and pop
instructions that do the basic stack operations.
This will take the value in %rax
and push it to the top of the stack, then remove the value from the top of the stack and put it in %rbx
.
push %rax pop %rbx
Note: the operand for push
is a source; the operand for pop
is a destination
Remember this diagram. There is a stack
that lives in our program's memory allocation.
The stack in assembly (like everything else) is much more manual than in higher-level languages.
It will do the same thing: give us a way to have memory that is dedicated to one call of a function (e.g. local variables).
The stack is managed by the stack pointer, which is the %rsp
register.
The %rsp
holds the memory address of the top of the stack. It must be call-preserved: if you're writing a function and change %rsp
, it must be returned to its original value before you ret
.
We haven't written any instructions that use memory yet. Here's an incomplete summary (more later):
%rsp
is a memory address; (%rsp)
is the value in memory at that location (the top of the stack).8(%rsp)
is the contents of memory 8 bytes after %rsp
.e.g. these instructions copy the 64-bit value at the top of the stack to %rax
, the second-from-top 64-bit value to %rbx
, and the third stack value to %rcx
:
mov (%rsp), %rax mov 8(%rsp), %rbx mov 16(%rsp), %rcx
This changes the 64-bit value on top of the stack to the value from %rdi
:
mov %rdi, (%rsp)
So the parens around a memory address in assembly act like the dereference operator (*
) in C/C++. These are roughly equivalent:
int64_t *s = magically_get_the_stack_pointer(); printf("%ld\n", s); printf("%ld\n", *s); printf("%ld\n", *(s+1)); // offset adjusted by sizeof(int64_t)
mov %rsp, %rdi call print_uint64 mov (%rsp), %rdi call print_uint64 mov 8(%rsp), %rdi call print_uint64
You can manipulate %rsp
directly (and we will), but there are instructions that make it easier: push
and pop
.
Here is a function that pushes the integer 4567 onto the stack, prints it, pops it, and returns.
Note: we have to use pushq
(instead of push
) to indicate we're operating on 64-bit values. [later topic: “operand sizes”]
use_the_stack_1: mov %rsp, %rdi call print_uint64 # print stack pointer pushq $4567 # push a 64-bit value onto the stack mov %rsp, %rdi call print_uint64 # print stack pointer mov (%rsp), %rdi call print_uint64 # print the value on top of the stack pop %r8 # it has to pop *to* somewhere: %r8 will do ret
140733574654760 140733574654752 4567
140733574654760 140733574654752 4567
Note: the stack pointer decreased when something was pushed. The top
of the stack is at lower memory locations than the bottom
.
All pushq
is doing: subtract 8 from %rsp
and store the value in memory at (%rsp)
. We could have done that ourselves with the same result…
The exact same function (except without the useless write to %r8
):
use_the_stack_2: mov %rsp, %rdi call print_uint64 # print stack pointer sub $8, %rsp movq $4567, (%rsp) # push a 64-bit value onto the stack mov %rsp, %rdi call print_uint64 # print stack pointer mov (%rsp), %rdi call print_uint64 # print the value on top of the stack add $8, %rsp # move %rsp without using the value ret
In both of those functions, this is what the stack looks like in the middle of the function (after the push
, before the pop
):
The stack is also how call
/ret
work. The address of the instruction to return to is on the stack.
call f
is essentially just this (%rip
is the instruction pointer):
push %rip jmp f
ret
is essentially:
pop %rip
We can use the stack as somewhere to preserve
any values (and have before to call-preserve register values). e.g. if we have a value we need later in %rdi
(which is not call-preserved), it might be lost by a function call.
mov $12345, %rdi call print_uint64 call print_uint64
12345 1
mov $12345, %rdi # imagine we worked hard to calculate this push %rdi # preserve %rdi on stack call print_uint64 mov (%rsp), %rdi # restore %rdi but leave it on the stack call print_uint64 pop %rdi # restore %rdi and pop to restore %rsp call print_uint64
12345 12345 12345
When writing assembly where you call a function, you could choose to use a preserved register so you don't have to worry about a function overwriting it.
But now you're the function that has to preserve that register for whoever called you.
e.g. we can use %r15
(which is call-preserved) to do some calculations and not worry about it changing.
mov $3331, %r15 # do some call-preserved calculations add $2, %r15 mov %r15, %rdi call print_uint64 mov %r15, %rdi call print_uint64
3333 3333
But now our whole function has to be something like this, so we restore the %r15
value of whoever called us before returning.
use_preserved_register: push %r15 # store initial %r15 value mov $3331, %r15 # do some call-preserved calculations add $2, %r15 mov %r15, %rdi call print_uint64 mov %r15, %rdi call print_uint64 pop %r15 # restore caller's %r15 ret
We can also use the stack to preserve a non-call-preserved register if we're about to call a function (and need that value again). %r8
is not call-preserved:
use_non_preserved_register: mov $3331, %r8 # do some non-call-preserved calculations add $2, %r8 mov %r8, %rdi push %r8 # push to store r8: function might destroy call print_uint64 pop %r8 # ensure r8 is back to our value mov %r8, %rdi # don't push because we don't need it again call print_uint64 ret
There's no reason you have to push
from and pop
to the same register. We have been using a pattern like this:
some_function: push %rdi # preserve the first argument call some_other_function pop %rdi # restore the first argument ... # use %rdi
There's no reason we couldn't restore to somewhere else if that is easier:
some_function: push %r15 # store caller's %r15 push %rdi # preserve the first argument call some_other_function pop %r15 # restore to a call-preserved regsiter ... # use %r15 pop %r15 # restore caller's %r15
The way values must be preserved
depends on which registers we're talking about.
We can use registers as local variables within a function (as long as we preserve them on the stack as needed), but there are a limited number of registers to work with. What if we need more?
I'm sure you have done some calculation that required more local variables than there are registers. Then what?
We can store as much stuff on the stack as we want, up to the OS's limit. Apparently that's 8 MB by default in Linux.
$ ulimit -s 8192
So it's a good place to store a modest amount of stuff, like our function's local variables.
Let's test that limit:
exceed_stack: mov $0, %rdi infinite_loop_start: push %rdi call print_uint64 mov (%rsp), %rdi # read previous %rdi value back add $1, %rdi jmp infinite_loop_start
Last output for me:
1047552 1047553 Segmentation fault (core dumped)
\[\begin{align*} &\phantom{=\ } 1047553\mbox{ 64-bit values} \\ &= 1047553 \times 8\mbox{ bytes/value} = 8380424\mbox{ bytes} \\ &= 8380424 / 1024\mbox{ bytes/kB} = 8184\mbox{ kB} \end{align*}\]
That's a little less than the limit because the calling code and print_uint64
used a little stack.
Let's try keeping local variables on the stack (even though in this example, they would fit in registers easily). We will duplicate this function, keeping a
and b
on the stack.
int64_t stack_locals(int64_t x, int64_t y) { int64_t a = x; int64_t b = y; a += b; b *= a; return b; }
… and using registers for only intermediate values during calculations. (Or maybe we're pretending that %rax
is the only register we can modify for some reason.)
We can start by pushing our two values onto the stack, getting memory space for our local variables:
stack_locals: push %rdi push %rsi # 8(%rsp) == a # (%rsp) == b
Then we can get on with the real work: a+=b; b*=a;
At most one of the operands for most instructions can be in memory (the other must be a register or constant), so we can approach it like this:
mov (%rsp), %rax # b to %rax add %rax, 8(%rsp) # a += b mov (%rsp), %rax # b to %rax imul 8(%rsp), %rax # b *= a mov %rax, (%rsp)
[The destination for imul
must be a register.]
Finally, we can clean everything up and return the result:
mov (%rsp), %rax # will return b add $16, %rsp # purge our stack content ret
We push
ed two 8-byte values, so adding 16 to %rsp
will effectively remove them from the stack.
It turns out, we have everything we need in assembly to implement recursive functions: local variables (on the stack), and some way to deal with the base case (conditional branches for if the base case…
logic).
Let's implement our (I assume) old friend factorial:
uint64_t factorial(uint64_t n) { if ( n == 0 ) { return 1; } else { return n * factorial(n-1); } }
Side quest…
The mul
instruction (unsigned integer multiplication) takes a single operand and multiplies %rax
by that value, putting the result in %rdx:%rax
(i.e. it's a 128-bit result with the high bits in %rdx
and low in %rax
).
If we ignore the possibility of the result being >64 bits, that means
is always going to do mul %rXX
rax*=rXX; rdx=0;
which is weird, but it's what we get.
Now we can implement the recursive algorithm easily enough in assembly… check if the argument is zero. If it is, return 1. Else, make the recursive call on n-1
.
Everything but the recursive case:
factorial: test %rdi, %rdi jne fact_recursive_case fact_base_case: mov $1, %rax ret fact_recursive_case: ...
To do the recursive call, we need to preserve our local variable
n
on the stack, calculate n-1
, call factorial
, multiply that result by n
.
fact_recursive_case: push %rdi # push our n value onto the stack sub $1, %rdi # n -= 1 call factorial # rax = factorial(n-1) pop %rdi # restore original n mul %rdi # rax *= n ret
Recursion might have felt like magic when you first saw it: there are these local variables that are somehow isolated, even though it's the same code for each recursive call. (i.e. there are many variables named
, even though you only wrote n
once.)int n
There was no magic. Just a stack.
We can use the stack to preserve our
values when calling other functions; we can use it to preserve values around recursive calls. Same thing.
You can do operations on 64-, 32-, 16-, or 8-bit operands. The smaller-operand versions might be faster, and the smaller values take less memory (important if you have a large array).
Generally, the operand size is implicit from the size of register you're working with.
add $1, %rsi # 64-bit addition add $1, %esi # 32-bit addition add $1, %si # 16-bit addition add $1, %sil # 8-bit addition
It's also possible to specify operand size with a bwlq
suffix as part of the instruction.
addb
: byte, 8-bits
addw
: word, 16-bits
addl
: long word, 32-bits
addq
: quad word, 64-bits
You can do that on any instruction that can work on different-sized operands. You should do it if there's any ambiguity, or you're trying to be more specific.
The sizes of all operands and the instruction have to match. Each of these is an error.
add %edx, %rdx # mixed 32- and 64-bit operands addq %edx, %edx # 64-bit instruction, 32-bit operands addl $1, %dx # 32-bit instruction, 16-bit operand addb $257, %dl # warning: 257 shortened to 1
Each of these is a legal 64-bit addition.
add %rcx, %rax addq %rcx, %rax add $1, %rax addq $1, %rax
It's necessary to specify the operand size on an instruction if the assembler can't infer it.
e.g. how many bytes around %rsp
are written by this? Are we storing an 8-, 16-, 32-, or 64-bit value?
mov $1, (%rsp)
The assembler will give a warning (not an error), prompting you to specify. e.g. if I am trying to store a 64-bit value:
movq $1, (%rsp)
There are reasons to use <64 bits for values: less memory, faster arithmetic.
But for this course, we'll mostly stick to 64-bit values/instructions/registers, just to simplify our lives.
Slightly related side-note: immediate values (i.e. constants in the assembly code like $1
) generally can only be ≤32-bit values. So even though this is a 64-bit add, the constant can't be larger than \(2^{32}-1\).
addq $1234, %rax
This will fail to assemble with operand type mismatch
. (The constant is \(2^{60}\).)
addq $1152921504606846976, %rax
The reason: the instruction register has to hold the instruction, and there's not enough room to encode 64-bit constants in an instruction (approximately: there are longer and more correct reasons).
The exception: movabs
which can move a 64-bit value into a register:
movabs $1152921504606846976, %rcx add %rcx, %rax