Assembly: Conditions and the Stack

Revisiting Commands

The commands we have been typing have done various parts of the broader compilation process:

gcc -c compiles C code to object code.
as assembles assembly code to object code.
ld links object files to create an executable.

Revisiting Commands

The long list of command line flags only apply to the compiler, not the other steps: they are about how C code is turned into assembly and aren't relevant to assembling or linking.

In labs 2 and 3, we will see the -g switch to enable debugging information. That can be done to either the assembler or compiler.

gcc -Wall -Wpedantic -std=c17 -march=haswell -O1 -g -c aaa.c
as --warn -g bbb.S -o bbb.o
ld aaa.o bbb.o -o program

Revisiting Commands

We also sometimes used gcc as a linker:

gcc aaa.o bbb.o -o program

The difference between gcc (as a linker) and ld: gcc knows how to set up the world so C code works (e.g. standard library linked in and initialized when the program starts), then it calls ld for you. Also, gcc automatically includes the C runtime, so your program starts at main instead of _start.

Revisiting Commands

But gcc also obviously knows how to assemble code: it's doing it with the assembly it generates from your C code all the time. It can also be given an assembly file and it knows how to handle it (i.e. it runs as for you).

If we rename _start to main in our assembly code, we can use gcc (or clang) for everything:

gcc -Wall -Wpedantic -std=c17 -march=haswell -O1 -g -c aaa.c
gcc -Wall -g -c bbb.S  # note: -c to assemble to a .o
gcc aaa.o bbb.o -o program

Revisiting Commands

You can also give gcc all of the input files at once, and have it make the executable in one step:

gcc -Wall -Wpedantic -std=c17 -march=haswell -O1 \
    -g aaa.c bbb.S -o program  # note: no -c

…if you want the same options applied to all files, and promise you understand which command line switches apply to which steps.

e.g. I might want -g debugging info on some files but not others: a single command can't do that.

Revisiting Commands

For lab 1, use the distinct assembling, compiling, linking commands as appropriate: convince us (and yourself) that you know the difference between the steps.

After that: as long as your code works as specified, it's okay.

Control Flow

Back to assembly…

We have one pair of tools that let us do anything besides executing the next instruction: call and ret.

Otherwise, every instruction has been followed by the next one in the assembly code.

Control Flow

Think about the control flow tools you have in C (and other imperative programming languages): if/else, for, while, etc.

We need to be able to logic like that before we get much farther.

Assembly doesn't have concepts like blocks of code (what's in {…} in C).

Control Flow

Everything other than continue to the next instruction in assembly is done with one of the jump instructions. *

We can jump to a specific memory location. The instruction there will be the next one executed.

* as long as we think of call and ret as a fancy jumps, which they are.

Control Flow

In other words, jumping sets the instruction pointer to the given memory address. Then the next instruction (loaded into the instruction register and executed) will be the one from that location in memory.

The jmp instruction can be used to jump to a specific location (indicated by a label referring to that memory location).

jmp over_there

Control Flow

This assembly creates a loop_forever function that starts a counter (in the %rcx register) at zero, and adds one to it in an infinite loop:

loop_forever:
    mov $0, %rcx
loop_top:
    add $1, %rcx
    jmp loop_top

Remember: a label just gives a name to a memory location. Here, we imagine loop_forever marks the start of a function, and loop_top indicates the top of our infinite loop. To the assembler, they're just labels of memory addresses.

Control Flow

loop_forever:
    mov $0, %rcx
loop_top:
    add $1, %rcx
    jmp loop_top

If we execute this, the mov happens first. Then an add.

Then the jmp sends execution back to the add. So, another add, then jmp, add, jmp, add, …

Control Flow

The jmp instruction is fine, but an infinite loop isn't what we usually want. We need conditions.

In an if/else, there's a condition that determines if we want to jump to the if or else body.

In any loop, there's a condition that determines if we want to jump back up to the top of the loop (to execute it again) or not (and continue on to the code below).

Control Flow

Conditions can be implemented with conditional branch instructions. They effectively work in two steps: first, do a comparision. Then jump only if the comparision had a certain result (else, continue to the next instruction).

Control Example: max

The challenge: write a function max that will return the larger of two signed integers. The appropriate .h:

int64_t max(int64_t a, int64_t b);

Control Example: max

In C, that would be something like this:

int64_t c_max(int64_t a, int64_t b) {
    if ( a >= b ) {
        return a;
    } else {
        return b;
    }
}

We need to be able to implement the >= comparison, and the if/else.

Control Example: max

int64_t max(int64_t a, int64_t b);

As always, we will find our first argument in the %rdi register and second in %rsi.

As always, the result needs to be put in %rax before we ret.

Control Example: max

To implement max, we need to copy either %rsi or %rdi into %rax. That logic can be implemented like this (once we figure out the conditional jump):

max:
    ???            # if %rdi < %rsi jump to "a_is_less"
    mov %rdi, %rax                           # return a
    ret
a_is_less:
    mov %rsi, %rax                           # return b
    ret

Control Example: max

First step: the comparison. The cmp instruction will compare two integers and set some flags to record the results [later topic: “status flags”].

We can start our max function as:

cmp %rsi, %rdi

Remember: %rdi = a and %rsi = b, so this is compare b with a.

Control Example: max

Now that we have compared a and b, we can look at the results and do a conditional jump: only jump if we found out that a<b.

That can be done with:

cmp %rsi, %rdi
jl a_is_less

The jl is jump if less. Implicit else: don't jump, and just go to the next instruction.

Control Example: max

All together, a full return max signed integer function:

max:
    cmp %rsi, %rdi
    jl a_is_less
    mov %rdi, %rax
    ret
a_is_less:
    mov %rsi, %rax
    ret

Control Example: max

Note: the cmp instruction's arguments are (in my mind at least), backwards. In this code %rdi = a and %rsi = b then we…

max:
    cmp %rsi, %rdi
    jl a_is_less
    ...
a_is_less:
    ...

Compare b and a and jump if less means a<b.

Conditional Jumps

After doing a cmp y, x (or some other operation that sets flags), there are several conditional jump instructions that will let us implement various comparisions.

Each jump is if condition, jump to label, else continue to the next instruction.

Conditional Jumps

After doing cmp y, x some jumps:

Instr.	Jump if…	Condition
`ja`	above	`x > y` (unsigned)
`jae`	above or equal	`x >= y` (unsigned)
`jb`	below	`x < y` (unsigned)
`jbe`	below or equal	`x <= y` (unsigned)
`jg`	greater	`x > y` (signed)
`jge`	greater or equal	`x >= y` (signed)
`jl`	less	`x < y` (signed)
`jle`	less or equal	`x <= y` (signed)
`je`	equal	`x == y`
`jne`	not equal	`x != y`

Conditional Jumps

Another useful instruction: the test x, x instruction can be used to check how x compares to zero…

Note: two operands, but we're setting them both to the same thing, like test %rdi, %rdi.

Conditional Jumps

After doing test x, x some jumps:

Instr.	Jump if…	Condition
`jg`	positive	`x > 0` (signed)
`jge`	positive or zero	`x >= 0` (signed)
`jl`	negative	`x < 0` (signed)
`jle`	negative or zero	`x <= 0` (signed)
`je`	zero	`x == 0`
`jne`	non-zero	`x != 0`

Control Structures

With conditional branches, we can implement the control structures we know from C and other languages (if we could implement a complex-enough condition).

For example, we have seen (part of) an if/else:

    cmp ???        # evaluate the condition
    j?? else_code  # jump conditionally to the "else"
    ...            # do the "if"
    jmp after_if_else
else_code:
    ...            # do the "else"
after_if_else:
    ...            # code that follows the if/else

Control Structures

Suppose we want to implement while-loop-like logic in assembly:

void loop_example() {
    uint64_t c = 0;
    while ( c < 3 ) {
        c += 1;
        printf("%ld\n", c);
    }
    return;
}

We expect this to print the loop counter three times: the loop exits when c >= 3.

Control Structures

We have the tools we need for that in assembly: use a register to count (I choose %rcx). Compare the counter to 3. Conditionally exit the loop.

Control Structures

This code implements the same logic: counter to 0; compare counter to 3; exit the loop if ≥; increment.

loop_example:
    mov $0, %rcx
loop_top:
    cmp $3, %rcx  # check the loop condition
    jge loop_end
    add $1, %rcx  # the loop "body"...
    # print...
    mov %rcx, %rdi
    push %rcx     # not call-preserved: store it on the stack
    call print_int64
    pop %rcx      # restore %rcx from the stack
    jmp loop_top  # back up to test the "while" condition
loop_end:
    ret

Control Structures

Hopefully you're convinced any of the control structures you know can be duplicated in assembly.

Textbooks often have sections how to translate C structure X to assembly. My advice: ignore those.

Control Structures

Just use the tools you have in assembly (mostly conditional jumps) to implement the logic you need. Don't try to translate C thoughts to assembly.

Status Flags

How do the conditional branches work? How do they know what happened in the last instruction?

There must be some information passed from the cmp to the jge instruction here.

    cmp $3, %rcx
    jge loop_end

Status Flags

The cmp and test instructions (also many others) set status flags or status bits or the EFLAGS register as a side-effect of their execution. Each flag is a single bit representing something about the result of an instruction.

Status Flags

e.g. the instruction cmp %rax, %rbx will set the zero flag (ZF) to 1 if the two values were equal, and to 0 if not.

Then if you do the je (jump if equal) instruction right after, it will check the ZF bit and branch if it's 1. Or jne (jump if not equal) will branch if it's 0.

Status Flags

Actually cmp just subtracts its two operands, sets the status flags, and throws away the result of the subtraction: the zero flag means equal here because if subtracting the two gives zero, they were equal.

There is actually an instruction jz (jump if zero) that is a synonym for je (jump if equal).

Status Flags

Every instruction sets the status flags as appropriate to what they're doing.

For example, every time we have added, the carry flag CF was set if there was a carry-out, i.e. an unsigned integer overflow. We can use jc (jump if carry) or jnc (jump if no carry) to detect this.

Status Flags

An example using one-byte registers (because the values are small enough to understand easily). The largest one-byte unsigned integer is 255, so this add carries:

status_test_1:
    mov $200, %dl  # %dl == first byte of %rdx
    add $57, %dl
    jc it_carried
    mov $0, %rax
    ret
it_carried:
    mov $1, %rax
    ret

… and we expect this function to return 1.

[Can also look in GDB: info registers eflags]

Status Flags

Or, we can use the sign flag SF to check if the last result was negative, where we imagine our integers as signed.

Here, the result is not negative, so we expect SF to be zero, so the jump will not happen and we return 0.

status_test_2:
    mov $10, %r8
    sub $8, %r8
    js its_negative
    mov $0, %rax
    ret
its_negative:
    mov $1, %rax
    ret

Status Flags

So, we can set status flags…

by doing an instruction specifically to set them, probably cmp or test; or
by doing some operation that we were going to do anyway, where we care about the status.

… and then we can do a conditional branch based on whatever logic we're trying to implement.

See our x86-64 cheat sheet for details of the conditional branches and the flags they examine.

Aside: Stacks

In case you haven't see a stack data structure…

It's a list/array kind of data structure, but there are two operations that can be done to a stack: push and pop.

push x: put x on the top of the stack (above any other values).
pop: take the top value off the stack and return it.

Aside: Stacks

Or visually, push 5 values then pop 5 values:

stack operations — wikimedia File:Lifo stack.svg

Aside: Stacks

The important part: popping removes values from the stack in the reverse of the order they were added.

If you prefer code: stack-ds.cpp, which implements (an extremely simple) stack in C++.

The Stack

We have seen the push and pop instructions that do the basic stack operations.

This will take the value in %rax and push it to the top of the stack, then remove the value from the top of the stack and put it in %rbx.

push %rax
pop %rbx

Note: the operand for push is a source; the operand for pop is a destination

The Stack

Remember this diagram. There is a stack that lives in our program's memory allocation.

The Stack

The stack in assembly (like everything else) is much more manual than in higher-level languages.

It will do the same thing: give us a way to have memory that is dedicated to one call of a function (e.g. local variables).

The Stack

The stack is managed by the stack pointer, which is the %rsp register.

The %rsp holds the memory address of the top of the stack. It must be call-preserved: if you're writing a function and change %rsp, it must be returned to its original value before you ret.

The Stack

We haven't written any instructions that use memory yet. Here's an incomplete summary (more later):

Wrap a memory address in parens to indicate the value at that memory location: %rsp is a memory address; (%rsp) is the value in memory at that location (the top of the stack).
To move forward/back from that location, put the number of bytes before the parens: 8(%rsp) is the contents of memory 8 bytes after %rsp.

The Stack

e.g. these instructions copy the 64-bit value at the top of the stack to %rax, the second-from-top 64-bit value to %rbx, and the third stack value to %rcx:

mov (%rsp), %rax
mov 8(%rsp), %rbx
mov 16(%rsp), %rcx

This changes the 64-bit value on top of the stack to the value from %rdi:

mov %rdi, (%rsp)

The Stack

So the parens around a memory address in assembly act like the dereference operator (*) in C/C++. These are roughly equivalent:

int64_t *s = magically_get_the_stack_pointer();
printf("%ld\n", s);
printf("%ld\n", *s);
printf("%ld\n", *(s+1)); // offset adjusted by sizeof(int64_t)

mov %rsp, %rdi
call print_uint64
mov (%rsp), %rdi
call print_uint64
mov 8(%rsp), %rdi
call print_uint64

The Stack

You can manipulate %rsp directly (and we will), but there are instructions that make it easier: push and pop.

Here is a function that pushes the integer 4567 onto the stack, prints it, pops it, and returns.

Note: we have to use pushq (instead of push) to indicate we're operating on 64-bit values. [later topic: “operand sizes”]

The Stack

use_the_stack_1:
    mov %rsp, %rdi
    call print_uint64 # print stack pointer
    
    pushq $4567       # push a 64-bit value onto the stack
    mov %rsp, %rdi
    call print_uint64 # print stack pointer

    mov (%rsp), %rdi
    call print_uint64 # print the value on top of the stack
    
    pop %r8      # it has to pop *to* somewhere: %r8 will do
    ret

140733574654760
140733574654752
4567

The Stack

140733574654760
140733574654752
4567

Note: the stack pointer decreased when something was pushed. The top of the stack is at lower memory locations than the bottom.

All pushq is doing: subtract 8 from %rsp and store the value in memory at (%rsp). We could have done that ourselves with the same result…

The Stack

The exact same function (except without the useless write to %r8):

use_the_stack_2:
    mov %rsp, %rdi
    call print_uint64  # print stack pointer
    
    sub $8, %rsp
    movq $4567, (%rsp) # push a 64-bit value onto the stack
    mov %rsp, %rdi
    call print_uint64  # print stack pointer

    mov (%rsp), %rdi
    call print_uint64  # print the value on top of the stack
    
    add $8, %rsp       # move %rsp without using the value
    ret

The Stack

In both of those functions, this is what the stack looks like in the middle of the function (after the push, before the pop):

The Stack

The stack is also how call/ret work. The address of the instruction to return to is on the stack.

call f is essentially just this (%rip is the instruction pointer):

push %rip
jmp f

ret is essentially:

pop %rip

[These aren't real assembly because you can't modify %rip directly like this: only the jump instructions can.]

Preserving on the Stack

We can use the stack as somewhere to preserve any values (and have before to call-preserve register values). e.g. if we have a value we need later in %rdi (which is not call-preserved), it might be lost by a function call.

mov $12345, %rdi
call print_uint64
call print_uint64

12345
1

Preserving on the Stack

mov $12345, %rdi  # imagine we worked hard to calculate this
push %rdi         # preserve %rdi on stack
call print_uint64
mov (%rsp), %rdi  # restore %rdi but leave it on the stack
call print_uint64
pop %rdi          # restore %rdi and pop to restore %rsp
call print_uint64

12345
12345
12345

Preserving on the Stack

When writing assembly where you call a function, you could choose to use a preserved register so you don't have to worry about a function overwriting it.

But now you're the function that has to preserve that register for whoever called you.

Preserving on the Stack

e.g. we can use %r15 (which is call-preserved) to do some calculations and not worry about it changing.

mov $3331, %r15   # do some call-preserved calculations
add $2, %r15
mov %r15, %rdi
call print_uint64
mov %r15, %rdi
call print_uint64

3333
3333

Preserving on the Stack

But now our whole function has to be something like this, so we restore the %r15 value of whoever called us before returning.

use_preserved_register:
    push %r15         # store initial %r15 value
    mov $3331, %r15   # do some call-preserved calculations
    add $2, %r15
    mov %r15, %rdi
    call print_uint64
    mov %r15, %rdi
    call print_uint64
    pop %r15          # restore caller's %r15
    ret

Preserving on the Stack

We can also use the stack to preserve a non-call-preserved register if we're about to call a function (and need that value again). %r8 is not call-preserved:

use_non_preserved_register:
    mov $3331, %r8 # do some non-call-preserved calculations
    add $2, %r8
    mov %r8, %rdi
    push %r8       # push to store r8: function might destroy
    call print_uint64
    pop %r8        # ensure r8 is back to our value
    mov %r8, %rdi  # don't push because we don't need it again
    call print_uint64
    ret

Preserving on the Stack

There's no reason you have to push from and pop to the same register. We have been using a pattern like this:

some_function:
    push %rdi # preserve the first argument
    call some_other_function
    pop %rdi  # restore the first argument
    ...       # use %rdi

There's no reason we couldn't restore to somewhere else if that is easier:

some_function:
    push %r15 # store caller's %r15
    push %rdi # preserve the first argument
    call some_other_function
    pop %r15  # restore to a call-preserved regsiter
    ...       # use %r15
    pop %r15  # restore caller's %r15

Preserving on the Stack

The way values must be preserved depends on which registers we're talking about.

Local Stack Variables

We can use registers as local variables within a function (as long as we preserve them on the stack as needed), but there are a limited number of registers to work with. What if we need more?

I'm sure you have done some calculation that required more local variables than there are registers. Then what?

Local Stack Variables

We can store as much stuff on the stack as we want, up to the OS's limit. Apparently that's 8 MB by default in Linux.

$ ulimit -s
8192

So it's a good place to store a modest amount of stuff, like our function's local variables.

Local Stack Variables

Let's test that limit:

exceed_stack:
    mov $0, %rdi
infinite_loop_start:
    push %rdi
    call print_uint64
    mov (%rsp), %rdi  # read previous %rdi value back
    add $1, %rdi
    jmp infinite_loop_start

Local Stack Variables

Last output for me:

1047552
1047553
Segmentation fault (core dumped)

\[\begin{align*} &\phantom{=\ } 1047553\mbox{ 64-bit values} \\ &= 1047553 \times 8\mbox{ bytes/value} = 8380424\mbox{ bytes} \\ &= 8380424 / 1024\mbox{ bytes/kB} = 8184\mbox{ kB} \end{align*}\]

That's a little less than the limit because the calling code and print_uint64 used a little stack.

Local Stack Variables

Let's try keeping local variables on the stack (even though in this example, they would fit in registers easily). We will duplicate this function, keeping a and b on the stack.

int64_t stack_locals(int64_t x, int64_t y) {
	int64_t a = x;
	int64_t b = y;
	a += b;
	b *= a;
	return b;
}

… and using registers for only intermediate values during calculations. (Or maybe we're pretending that %rax is the only register we can modify for some reason.)

Local Stack Variables

We can start by pushing our two values onto the stack, getting memory space for our local variables:

stack_locals:
    push %rdi
    push %rsi
    # 8(%rsp) == a
    # (%rsp) == b

Local Stack Variables

Then we can get on with the real work: a+=b; b*=a;

At most one of the operands for most instructions can be in memory (the other must be a register or constant), so we can approach it like this:

mov (%rsp), %rax   # b to %rax
add %rax, 8(%rsp)  # a += b

mov (%rsp), %rax   # b to %rax
imul 8(%rsp), %rax # b *= a
mov %rax, (%rsp)

[The destination for imul must be a register.]

Local Stack Variables

Finally, we can clean everything up and return the result:

mov (%rsp), %rax   # will return b
add $16, %rsp      # purge our stack content
ret

We pushed two 8-byte values, so adding 16 to %rsp will effectively remove them from the stack.

Recursion

It turns out, we have everything we need in assembly to implement recursive functions: local variables (on the stack), and some way to deal with the base case (conditional branches for if the base case… logic).

Let's implement our (I assume) old friend factorial:

uint64_t factorial(uint64_t n) {
    if ( n == 0 ) {
        return 1;
    } else {
        return n * factorial(n-1);
    }
}

Recursion

Side quest…

The mul instruction (unsigned integer multiplication) takes a single operand and multiplies %rax by that value, putting the result in %rdx:%rax (i.e. it's a 128-bit result with the high bits in %rdx and low in %rax).

If we ignore the possibility of the result being >64 bits, that means mul %rXX is always going to do rax*=rXX; rdx=0; which is weird, but it's what we get.

Recursion

Now we can implement the recursive algorithm easily enough in assembly… check if the argument is zero. If it is, return 1. Else, make the recursive call on n-1.

Everything but the recursive case:

factorial:
    test %rdi, %rdi
    jne fact_recursive_case
fact_base_case:
    mov $1, %rax
    ret
fact_recursive_case:
    ...

[The fact_base_case label isn't being used for anything, but that's okay. Think of it like a weird comment.]

Recursion

To do the recursive call, we need to preserve our local variable n on the stack, calculate n-1, call factorial, multiply that result by n.

fact_recursive_case:
    push %rdi      # push our n value onto the stack
    sub $1, %rdi   # n -= 1
    call factorial # rax = factorial(n-1)
    pop %rdi       # restore original n
    mul %rdi       # rax *= n
    ret

Recursion

Recursion might have felt like magic when you first saw it: there are these local variables that are somehow isolated, even though it's the same code for each recursive call. (i.e. there are many variables named n, even though you only wrote int n once.)

There was no magic. Just a stack.

We can use the stack to preserve our values when calling other functions; we can use it to preserve values around recursive calls. Same thing.

Operand Size

You can do operations on 64-, 32-, 16-, or 8-bit operands. The smaller-operand versions might be faster, and the smaller values take less memory (important if you have a large array).

Generally, the operand size is implicit from the size of register you're working with.

    add $1, %rsi    # 64-bit addition
    add $1, %esi    # 32-bit addition
    add $1, %si     # 16-bit addition
    add $1, %sil    # 8-bit addition

These are technically four different instructions, but I think of them all as add.

Operand Size

It's also possible to specify operand size with a bwlq suffix as part of the instruction.

addb: byte, 8-bits
addw: word, 16-bits
addl: long word, 32-bits
addq: quad word, 64-bits

You can do that on any instruction that can work on different-sized operands. You should do it if there's any ambiguity, or you're trying to be more specific.

Operand Size

The sizes of all operands and the instruction have to match. Each of these is an error.

    add %edx, %rdx  # mixed 32- and 64-bit operands
    addq %edx, %edx # 64-bit instruction, 32-bit operands
    addl $1, %dx    # 32-bit instruction, 16-bit operand
    addb $257, %dl  # warning: 257 shortened to 1

Each of these is a legal 64-bit addition.

    add %rcx, %rax
    addq %rcx, %rax
    add $1, %rax
    addq $1, %rax

Operand Size

It's necessary to specify the operand size on an instruction if the assembler can't infer it.

e.g. how many bytes around %rsp are written by this? Are we storing an 8-, 16-, 32-, or 64-bit value?

    mov $1, (%rsp)

The assembler will give a warning (not an error), prompting you to specify. e.g. if I am trying to store a 64-bit value:

    movq $1, (%rsp)

Operand Size

There are reasons to use <64 bits for values: less memory, faster arithmetic.

But for this course, we'll mostly stick to 64-bit values/instructions/registers, just to simplify our lives.

Operand Size

Slightly related side-note: immediate values (i.e. constants in the assembly code like $1) generally can only be ≤32-bit values. So even though this is a 64-bit add, the constant can't be larger than $2^{32}-1$.

    addq $1234, %rax

This will fail to assemble with operand type mismatch. (The constant is $2^{60}$.)

    addq $1152921504606846976, %rax

Operand Size

The reason: the instruction register has to hold the instruction, and there's not enough room to encode 64-bit constants in an instruction (approximately: there are longer and more correct reasons).

The exception: movabs which can move a 64-bit value into a register:

    movabs $1152921504606846976, %rcx
    add %rcx, %rax