When writing assembly, we have mostly used registers to store our data. That's great: registers are fast and easy to work with.
But there's a very limited number of them.
We have had only one way to work with memory so far: the stack. We can push
(or subtract from %rsp
) to get ourselves some memory to work with, and then pop
(or add to %rsp
) when we're done with it.
That's fine, but the stack is also limited.
But the stack is only one of the categories of memory a program can work with:
Let's get at those other pools of memory.
But first…
What is memory exactly?
At the circuit level, memory is a storing bits with a bunch of tiny capacitors and using charged
for a 1 and discharged
for a 0. That's not much help for understanding how to worth with it.
How I actually think of memory: an array of bytes.
uint8_t memory[] = ???;
Memory can be addressed. An address is basically the index into the array
of bytes: each byte has a unique address.
The basic operations I can do on memory: read it (examine a value: x = memory[1234]
) and write it (update a value: memory[1234] = x
).
There's a byte at address \(n\) in memory. If I wanted to, I could think of bytes \(n\) to \(n+7\) as a 64-bit signed integer value.
But, as always, the hardware doesn't care what I think the values are: it will store whatever bytes I tell it to, and do whatever instructions I give it.
A pointer is just a number representing an address in memory.
In C, if I tell you p
is a pointer, that means it's an integer that refers to an address in memory. An int64_t*
(pointer to an int64_t
) is a memory address where locations p
to p+7
hold bits we should interpret as an int64_t
.
The %rsp
register is intended to always hold a pointer: the memory address of the top of the stack.
Other registers could hold a pointer, if that's how we think of the value we put in there.
In most programming languages, a reference
is essentially a pointer, but you aren't allowed to see the actual memory address. In my mind, a C++ reference is conceptually like:
class Int64Reference { private: int64_t* p; public: // things to work with p but not let you see it directly }
Hiding actual pointers away from the programmer prevents pointer arithmetic: you can't move a few bytes over a see what's there. You just have to follow the reference to a nicely managed data structure.
Pointers let you get into trouble: add/subtract and have a look at other memory, or follow the pointer and treat it as a different type of data. References protect programmers from that danger.
They are otherwise conceptually similar: both let you refer to somewhere in memory with a small value (the memory address).
e.g. if you have a huge object in memory, you can pass a pointer or reference to a function. That's cheaper than copying the whole object and still lets the function see it. But (in most languages), it would also let the function modify the object.
Note that pointer arithmetic in C is a little unexpected: you can move forward/back in memory by adding/subtracting from a pointer, but the offsets are multiplied by the size of the values.
e.g. we might have a pointer to an 8-byte (64-bit) value like this:
int64_t n = 123456; int64_t* p = &n;
We know that sizeof(int64_t)==8
. We can check what memory addresses p
and p+1
refer to:
printf("%p\n", p); printf("%p\n", p+1);
0x7fff4e87cea0 0x7fff4e87cea8
Adding i
to the pointer adds i*sizeof(type)
to the memory address.
Assembly isn't going to do us the same favour. In assembly, a pointer is just an integer that we imagine representing a memory address.
We have seen the .text
segment
in our assembly files: it contains code. Actual executable stuff:
.text my_function: mov $0, %rax
The linker (ld
or gcc
) will collect the .text
segments from all of the assembled code into our program
.
The purpose of the
is to tell the assembler/linker that this section of our file is code..text
There are more segments that we can have in our assembly code. These pieces are also given to the linker, and put together (with other object files that have some/all of the same segments) to make a single executable.
The data segment is used to store values that are stored in the executable file and initialized when the program starts, i.e. initialized static memory.
Labels are used for the same thing in .data
as in .text
: to give a name to a memory address. Then you describe some memory contents that will be there when the program starts.
e.g. this assembly is roughly equivalent to this C:
.data value: .quad 1234
static int64_t value = 1234;
The .quad
directive tells the assembler I want a quad-word (64-bits) of memory holding this value
(but nothing about signed/unsigned: that's for us to remember).
Or .fill
can be used to create an array-like memory space. You can give a repeat, size, and initial value.
.data array: .fill 100, 8, 0
i.e. 100 8-byte spaces, each initialized to zero. C equivalent:
static int64_t array[100] = {0};
Or the .bss
segment can be used to specify uninitialized static memory. e.g. this creates an array of 100 8-byte values that will not be initialized.
.bss array: .fill 100, 8
Similar to:
static int64_t array[100];
The magic words to specify memory chunks mirrors the instruction suffixes:
Bits | Name | Instr Suffix | Assembler Literal |
---|---|---|---|
64 | Quad Word | q | .quad |
32 | Long Word | l | .long |
16 | Word | w | .word |
8 | Byte | b | .byte |
Both .bss
and .data
give us a way to reserve some static memory.
.bss
vs .data
…
Whatever is described in .data
must be stored in the executable. It will be read into memory and be there when your code starts.
Whatever is in .bss
takes almost no space in the file, but you are responsible for initializing it in code.
An array is just a sequence of values of a specific type (and therefore size) that are adjacent in memory.
In C, when you create an array like this:
int64_t array1[10];
… it is a stack array so we expect that to cause something like this in assembly:
sub $80, %rsp
… with a corresponding
before any add $80, %rsp
ret
.
int64_t array1[10];
In C, array variables are effectively just pointers to the start of the array.
The value in %rsp
after appropriate sub
will be known as the pointer array1
in that code.
Similarly, we could ask for an array of 10 values on the heap in C like this:
int64_t* array2 = (int64_t*)malloc(10 * sizeof(int64_t));
Here, the pointer array2
is on the stack and the 80 bytes for the array are on the heap.
When our function returns, the stack variable array2
will disappear.
After that, it would be impossible to free
that 80 bytes and we would have a memory leak. We must have a corresponding free(array2)
before we return.
We don't have any (direct) equivalent of malloc
/free
in assembly, but we could call to the C function and get back a pointer to our allocation.
And of course, call free
from assembly as well.
Or we can ask for some static memory in C:
static int64_t array3[10];
We would expect that to correspond to some assembly like one of these (if we do/don't initialize it):
.data array3: .fill 10, 8, 0
.bss array3: .fill 10, 8
But how would we use that memory if we had it?
Let's talk more about how you specify the operand (≈argument) for assembly instructions…
Compare the first operands in these two instructions:
add %rcx, %rax add $1, %rax ret
The first (%rcx
) refers to the value in a register: it is a register operand.
The second ($1
) refers to the number 1, as it is in the code: it is an immediate operand.
These are addressing modes: ways to specify the source/destination of instruction operands. So far, we have seen register
and immediate
addressing/operands.
[We accessed memory around %rsp
, but let's ignore that and start from scratch…]
To refer to memory location with a label, we can just mention it by name. In fact, we have been doing this too, when calling functions:
some_functon: ... main: call some_function
Note: here, the value call
needs is the memory address, not the contents of the memory at that location.
We have done a little arithmetic on the stack pointer to look down the stack at values away from the top
.
We have always seen offsets that are multiples of 8 because we were using 64-bit values, but we could push and pop values of any size if we wanted. e.g. if we push
ed three 64-bit values:
Let's try with some static memory. Suppose a data segment like this:
.data n: .quad 123 arr: .quad 6 .quad 7 .quad 8 .quad 9
i.e. something I'll interpret as a 64-bit integer (n
) and an array of four 64-bit integers (arr
).
To refer to the contents of a memory address, wrap the address in parentheses.
.text use_some_memory: push %rbx mov (n), %rdi call print_uint64
That's a movq
operation: the assembler knows to read 64 bits from memory because of the %rdi
destination.
We will often need to work with the memory addresses to our data: pointers.
The lea
instruction (Load Effective Address) can be used to get a pointer to something in memory: lea
is analogous to the &
operator in C.
lea n, %rbx # %rbx == pointer to n mov (%rbx), %rdi # (%rbx) == value stored in n call print_uint64 # prints n
This mov
also copies 8 bytes from memory because of the 64-bit %rdi
destination.
What lea
is doing, compared to analogous C code:
Assembly | C |
---|---|
mov (n), %rdi | rdi = n; |
lea (n), %rdi | rdi = &n; |
mov (%rsi), %rdi | rdi = *rsi; |
lea (%rsi), %rdi | ??? |
We can use lea
to start working with arrays. We can get the address of the start of an array into a register.
lea arr, %rbx # %rbx == address of array element 0 mov (%rbx), %rdi call print_uint64 # prints array element 0 mov %rbx, %rdi call print_uint64 # prints address of array element 0
Will output something like:
6 4206600
We have an array
of 64-bit integers (8 bytes for each element). So, we can get from element 0 to element 1 by moving 8 bytes over.
lea arr, %rbx # %rbx == address of array element 0 add $8, %rbx # %rbx == address of array element 1 mov (%rbx), %rdi call print_uint64 # prints array element 1 mov %rbx, %rdi call print_uint64 # prints address of array element 1
7 4206608
What if we want to access the \(n\)-th element? Let's imagine we're using %rcx
as a counter and want to access position %rcx
in the array.
lea arr, %rbx mov $2, %rcx # set our "counter" to 2 shl $3, %rcx # %rcx *= 8 add %rcx, %rbx # %rbx += 16 mov (%rbx), %rdi call print_uint64 # prints array element 2
But accessing array elements this way is really common. We don't want to manually calculate the address every time, and the processor will help.
The (address)
operand can actually have more parts.
We have been giving the address, but we often want to talk about memory contents relative to that address.
We can give a second value, the index which indicates how far to move from the address.
mov (%rbx, %rcx), %rdi
Here, %rdi
would get the value from memory location %rbx + %rcx
.
That's useful if we're imagining our counter tracking byte offsets (e.g. our loop increment is something like i+=8
), but I don't want to track memory addresses.
A third component is the scale: how much to multiply the index by. It can only be 1
, 2
, 4
, or 8
, but those are often useful. In our array…
mov (%rbx, %rcx, 8), %rdi
will get memory location %rbx + %rcx*8
, which is what we want.
The full code snippet:
mov $2, %rcx # set our "counter" to 2 lea arr, %rbx mov (%rbx, %rcx, 8), %rdi call print_uint64 # prints array element 2
We don't need to modify %rbx
or %rcx
like we did in previous examples: we just get the memory access we want right away.
There's one more piece we can give in a memory access, the displacement (or offset).
It comes before the (…)
and gives a constant value to add to the address. (It must be a literal constant, not a register value):
mov 16(%rbx), %rdi
…refers to memory address %rbx + 16
.
This is what we used before around the stack pointer, specifying memory locations like 16(%rsp)
.
So all together, this…
mov 12(%rbx, %rcx, 4), %rdi
references 8 bytes (64 bits, because of the %rdi
destination) starting at memory location %rbx + %rcx*4 + 12
.
The collection of addressing modes might seem crazy, but consider non-crazy C code like this:
typedef struct { int32_t a; int32_t b; } pair; int main() { pair* pairs = (pair*)malloc(N * sizeof(pair)); for (int i = 0; i < N; i++) { pairs[i].a = 10; pairs[i].b = 11; } free(pairs); }
Let's imagine we use %rbx
to hold the pointer to pairs
, and %rcx
to hold the counter i
.
The int32_t
are 4 bytes each, so the struct is 8 bytes in total. The i==7
iteration of that loop would effectively be:
mov $7, %rcx lea pairs, %rbx movl $10, (%rbx, %rcx, 8) movl $11, 4(%rbx, %rcx, 8)
The last instruction sets 32 bits (movl
) at mem[pairs + i*8 + 4]
, which is exactly pairs[i].b
.
Using mem[x]
to refer to memory at location x
…
Mode | Example | Meaning |
---|---|---|
Immediate | $4 | 4 |
Register | %rax | %rax |
Indirect | (%rbx) | mem[%rbx] |
Indirect | label | mem[label] |
Indirect | (label) | mem[label] |
Indexed | (%rbx, %rdx) | mem[%rbx + %rdx] |
Scaled | (%rbx, %rdx, 8) | mem[%rbx + %rdx*8] |
Offset | 4(%rbx) | mem[%rbx + 4] |
Scaled+Offset | 4(%rbx, %rdx, 8) | mem[%rbx + %rdx*8 + 4] |
[Draft: more updates soon.]
The given values have to be registers and literal values as in these examples.
displacement(address, index, scale) e.g. 4(%rbx, %rdx, 8)
The address and index must be a register, the displacement a literal integer, and scale one of 1, 2, 4, 8.
[And cases for labels that we'll discuss more momentarily like (n)
and n(%rip)
.]
When we moved around the stack before, we used operands like 16(%rsp)
to look 16 bytes from the top of the stack. That was good enough at the time.
pushq $12 pushq $13 pushq $14 mov 16(%rsp), %rdi # print the 12 call print_uint64
Any of the other addressing modes can also be used around the stack pointer.
mov $2, %rcx mov (%rsp, %rcx, 8), %rdi call print_uint64 # also prints 12
And that whole story works if we link with ld
(and its default behaviour).
If we try to link that code with gcc
(and its default behaviour), we get a not-very-helpful error message complaining about lines like this:
mov (arr), %rdi
mov (arr), %rdi
The problem with that code is that it refers to a literal memory address in the static (.data
) segment.
GCC would like to build an executable that can be put anywhere in memory, so the exact address of arr
might change.
In other words, it's trying to build position independent code (PIC) or a position independent executable (PIE): code that can be loaded anywhere in the computer's memory.
The ld
default is non-PIE, so it worked.
Position independence is useful for shared libraries (where several will be loaded together with a single program), and for address space layout randomization which is a way to mitigate security problems around memory acccesses.
But we don't really care why. We just want our code to link correctly.
mov (arr), %rdi
So, we can't just refer to the literal memory address of one of our labels like that: the address might change.
The alternative is to ask the linker to fill in an address relative to the instruction pointer: RIP-relative addressing. The linker can promise that the static memory address will be a certain offset before/after the current instruction (and figure out what it is).
The way we ask for that is by expressing the label as a displacement from the instruction pointer:
mov arr(%rip), %rdi
With that, we can work with static memory (.data
and .bss
segments) in a way that works everywhere.
lea arr(%rip), %rbx mov (%rbx), %rdi call print_uint64 # prints 6 mov 8(%rbx), %rdi call print_uint64 # prints 7 mov n(%rip), %rdi call print_uint64 # prints 123
The RIP-relative addressing is not relevant for call x
instructions where the assembler is doing something more complex and figures it all out. Otherwise, we will probably want to use x(%rip)
instead.
Let's revise that table…
Mode | Example | Meaning |
---|---|---|
Immediate | $4 | 4 |
Register | %rax | %rax |
Indirect | (%rbx) | mem[%rbx] |
label | mem[label] | |
(label) | ||
Indexed | (%rbx, %rdx) | mem[%rbx + %rdx] |
Scaled | (%rbx, %rdx, 8) | mem[%rbx + %rdx*8] |
Offset | 4(%rbx) | mem[%rbx + 4] |
Scaled+Offset | 4(%rbx, %rdx, 8) | mem[%rbx + %rdx*8 + 4] |
RIP Relative | label(%rip) | mem[label] |
And most of those are redundant: there's really only one memory access case that can have missing parts.
Mode | Example | Meaning |
---|---|---|
Immediate | $4 | 4 |
Register | %rax | %rax contents |
Scaled+Offset | 4(%rbx, %rdx, 8) | mem[%rbx + %rdx*8 + 4 ] |
Jumping/Calling | label | jump to label |
RIP Relative | label(%rip) | mem[label ] |
Revising the lea
table:
Assembly | C |
---|---|
mov n(%rip), %rdi | rdi = n; |
lea n(%rip), %rdi | rdi = &n; |
mov (%rsi), %rdi | rdi = *rsi; |
lea (%rsi), %rdi | ??? |
What if we want more on the stack? Like a small array?
The challenge: take an array size \(n\) as an argument. Create an array of that many 64-bit integers on the stack. Fill it with \(0\) to \(n-1\), then add them up.
I'm going to use %r15
to hold the \(n\) argument so we have it. It's call-preserved, so we have to preserve the caller's value. We're also going to use %rbx
, so preserve that too.
Then, we can just subtract from %rsp
to get ourselves enough stack space for \(8n\) bytes that we'll think of as the array.
stack_array: push %r15 push %rbx mov %rdi, %r15 # %r15 = n shl $3, %rdi sub %rdi, %rsp # %rsp -= 8*n
I'm going to use %rbx
to keep a pointer to the start of the array.
mov %rsp, %rbx # %rbx = &array
In this function we could use %rsp
directly for this, since it will have that value throughout the function. In a function where the stack is used for other stuff, it might be necessary to have a separate pointer, so I'll do it here.
I will use %rcx
as my loop counter, and the scaled addressing mode to get to element %rcx
of the array.
mov $0, %rcx # %rcx = i fill_loop: mov %rcx, (%rbx, %rcx, 8) # arr[i] = i inc %rcx cmp %rcx, %r15 # while n > i ja fill_loop
Then we can loop through the array, adding to an accumulator as we go.
mov $0, %rcx # %rcx = i mov $0, %rax # %rax = accumulator sum_loop: add (%rbx, %rcx, 8), %rax # acc += arr[i] inc %rcx cmp %rcx, %r15 # while n > i ja sum_loop
Finally, clean up: put the stack pointer back to effectively pop the array, then restore the registers we had pushed.
shl $3, %r15 add %r15, %rsp # %rsp += 8*n pop %rbx pop %r15 ret
A probably-unreadable comparison of the length of that assembly vs equivalent C:
stack_array: push %r15 push %rbx mov %rdi, %r15 # %r15 = n shl $3, %rdi sub %rdi, %rsp # %rsp -= 8*n mov %rsp, %rbx # %rbx = &array mov $0, %rcx # %rcx = i fill_loop: mov %rcx, (%rbx, %rcx, 8) # arr[i] = i inc %rcx cmp %rcx, %r15 # while n > i ja fill_loop mov $0, %rcx # %rcx = i mov $0, %rax # %rax = accumulator sum_loop: add (%rbx, %rcx, 8), %rax # acc += arr[i] inc %rcx cmp %rcx, %r15 # while n > i ja sum_loop shl $3, %r15 add %r15, %rsp # %rsp += 8*n pop %rbx pop %r15 ret |
int64_t stack_array(uint64_t n) { int64_t arr[n]; uint64_t i = 0; do { // fill the array arr[i] = i; i++; } while (n > i); i = 0; int64_t acc = 0; do { // sum the array contents acc += arr[i]; i++; } while (n > i); return acc; } |
As mentioned before, there are two distinct syntaxes for x86 assembly. We're using the AT&T syntax that the GNU Assembler and the Bryant and O'Hallaron text use.
The Intel syntax is more common in general. Notably, it's used by the Intel reference documentation and the NASM assembler.
I think it's worth being able to translate the basic ideas AT&T ↔ Intel, just so you can deal with different documentation.
Biggest difference: in Intel syntax, the destination operand is first, not last like AT&T.
Also there's less punctuation: register names do not have the leading %
. These are two equivalent instructions in AT&T and Intel syntax:
mov %rdi, %rax
mov rax, rdi
Similarly, numbers do not have the leading $
.
add $7, %rax
add rax, 7
The Intel syntax might make more sense if you understand this instruction as rax += 7
, but less if you read it as add 7 to rax
.
Addressing memory is more readable in Intel syntax.
mov 4(%rbp, %rcx, 8), %r11
mov r11, [rbp + rcx*8 + 4]
Important note: these are not two different languages, just different ways to express the same thing. Both have the same instructions, addressing modes, etc. Both syntaxes (with appropriate assemblers) produce exactly the same machine code.
There are more differences, but those are enough that you can probably read Intel-syntax assembly code or documentation.
Finally we might be able to read an assembly reference: x86 and amd64 Instruction Reference. Some notations they use:
r64
: a 64-bit general-purprose register (e.g. %rax
).r/m64
: a 64-bit general-purprose register (e.g. %rax
) or a reference to 64-bits of memory (e.g. (%rax)
).imm32
: a 32-bit contanst (e.g. $1234
)m
: a reference to a memory address (e.g. 4(%rbx, %rcx, 8)
)