Assembly: Using Memory

When writing assembly, we have mostly used registers to store our data. That's great: registers are fast and easy to work with.

But there's a very limited number of them.

We have had only one way to work with memory so far: the stack. We can push (or subtract from %rsp) to get ourselves some memory to work with, and then pop (or add to %rsp) when we're done with it.

That's fine, but the stack is also limited.

  • It can only hold a moderate amount (≈8 MB).
  • A function must give up it stack space when it returns: we can't build a data structure on the stack and return a pointer to it.

But the stack is only one of the categories of memory a program can work with:

static, heap, stack memory

Let's get at those other pools of memory.

But first…


What is memory exactly?

At the circuit level, memory is a storing bits with a bunch of tiny capacitors and using charged for a 1 and discharged for a 0. That's not much help for understanding how to worth with it.


How I actually think of memory: an array of bytes.

uint8_t memory[] = ???;

Memory can be addressed. An address is basically the index into the array of bytes: each byte has a unique address.

The basic operations I can do on memory: read it (examine a value: x = memory[1234]) and write it (update a value: memory[1234] = x).


There's a byte at address \(n\) in memory. If I wanted to, I could think of bytes \(n\) to \(n+7\) as a 64-bit signed integer value.

But, as always, the hardware doesn't care what I think the values are: it will store whatever bytes I tell it to, and do whatever instructions I give it.


A pointer is just a number representing an address in memory.

In C, if I tell you p is a pointer, that means it's an integer that refers to an address in memory. An int64_t* (pointer to an int64_t) is a memory address where locations p to p+7 hold bits we should interpret as an int64_t.


The %rsp register is intended to always hold a pointer: the memory address of the top of the stack.

Other registers could hold a pointer, if that's how we think of the value we put in there.


In most programming languages, a reference is essentially a pointer, but you aren't allowed to see the actual memory address. In my mind, a C++ reference is conceptually like:

class Int64Reference {
    int64_t* p;
    // things to work with p but not let you see it directly


Hiding actual pointers away from the programmer prevents pointer arithmetic: you can't move a few bytes over a see what's there. You just have to follow the reference to a nicely managed data structure.

Pointers let you get into trouble: add/​subtract and have a look at other memory, or follow the pointer and treat it as a different type of data. References protect programmers from that danger.


They are otherwise conceptually similar: both let you refer to somewhere in memory with a small value (the memory address).

e.g. if you have a huge object in memory, you can pass a pointer or reference to a function. That's cheaper than copying the whole object and still lets the function see it. But (in most languages), it would also let the function modify the object.


Note that pointer arithmetic in C is a little unexpected: you can move forward/​back in memory by adding/​subtracting from a pointer, but the offsets are multiplied by the size of the values.


e.g. we might have a pointer to an 8-byte (64-bit) value like this:

int64_t n = 123456;
int64_t* p = &n;

We know that sizeof(int64_t)==8. We can check what memory addresses p and p+1 refer to:

printf("%p\n", p);
printf("%p\n", p+1);


Adding i to the pointer adds i*sizeof(type) to the memory address.

Assembly isn't going to do us the same favour. In assembly, a pointer is just an integer that we imagine representing a memory address.

Assembly Code Segments

We have seen the .text segment in our assembly files: it contains code. Actual executable stuff:

    mov $0, %rax

The linker (ld or gcc) will collect the .text segments from all of the assembled code into our program.

Assembly Code Segments

The purpose of the .text is to tell the assembler/​linker that this section of our file is code.

There are more segments that we can have in our assembly code. These pieces are also given to the linker, and put together (with other object files that have some/​all of the same segments) to make a single executable.

Assembly Code Segments

The data segment is used to store values that are stored in the executable file and initialized when the program starts, i.e. initialized static memory.

Labels are used for the same thing in .data as in .text: to give a name to a memory address. Then you describe some memory contents that will be there when the program starts.

Assembly Code Segments

e.g. this assembly is roughly equivalent to this C:

    .quad 1234
static int64_t value = 1234;

The .quad directive tells the assembler I want a quad-word (64-bits) of memory holding this value (but nothing about signed/​unsigned: that's for us to remember).

Assembly Code Segments

Or .fill can be used to create an array-like memory space. You can give a repeat, size, and initial value.

    .fill 100, 8, 0

i.e. 100 8-byte spaces, each initialized to zero. C equivalent:

static int64_t array[100] = {0};

Assembly Code Segments

Or the .bss segment can be used to specify uninitialized static memory. e.g. this creates an array of 100 8-byte values that will not be initialized.

    .fill 100, 8

Similar to:

static int64_t array[100];

Assembly Code Segments

The magic words to specify memory chunks mirrors the instruction suffixes:

BitsNameInstr SuffixAssembler Literal
64Quad Wordq.quad
32Long Wordl.long

Assembly Code Segments

Both .bss and .data give us a way to reserve some static memory.

static, heap, stack memory

Assembly Code Segments

.bss vs .data

Whatever is described in .data must be stored in the executable. It will be read into memory and be there when your code starts.

Whatever is in .bss takes almost no space in the file, but you are responsible for initializing it in code.

Arrays and Memory

An array is just a sequence of values of a specific type (and therefore size) that are adjacent in memory.

In C, when you create an array like this:

int64_t array1[10];

… it is a stack array so we expect that to cause something like this in assembly:

sub $80, %rsp

… with a corresponding add $80, %rsp before any ret.

Arrays and Memory

int64_t array1[10];

In C, array variables are effectively just pointers to the start of the array.

The value in %rsp after appropriate sub will be known as the pointer array1 in that code.

Arrays and Memory

Similarly, we could ask for an array of 10 values on the heap in C like this:

int64_t* array2 = (int64_t*)malloc(10 * sizeof(int64_t));

Here, the pointer array2 is on the stack and the 80 bytes for the array are on the heap.

Arrays and Memory

When our function returns, the stack variable array2 will disappear.

After that, it would be impossible to free that 80 bytes and we would have a memory leak. We must have a corresponding free(array2) before we return.

Arrays and Memory

We don't have any (direct) equivalent of malloc/free in assembly, but we could call to the C function and get back a pointer to our allocation.

And of course, call free from assembly as well.

Arrays and Memory

Or we can ask for some static memory in C:

static int64_t array3[10];

We would expect that to correspond to some assembly like one of these (if we do/don't initialize it):

    .fill 10, 8, 0
    .fill 10, 8

Arrays and Memory

But how would we use that memory if we had it?

Let's talk more about how you specify the operand (≈argument) for assembly instructions…

Addressing Modes

Compare the first operands in these two instructions:

    add %rcx, %rax
    add $1, %rax

The first (%rcx) refers to the value in a register: it is a register operand.

The second ($1) refers to the number 1, as it is in the code: it is an immediate operand.

Addressing Modes

These are addressing modes: ways to specify the source/​destination of instruction operands. So far, we have seen register and immediate addressing/​operands.

[We accessed memory around %rsp, but let's ignore that and start from scratch…]

Addressing Modes

To refer to memory location with a label, we can just mention it by name. In fact, we have been doing this too, when calling functions:

   call some_function

Note: here, the value call needs is the memory address, not the contents of the memory at that location.

Addressing Modes

We have done a little arithmetic on the stack pointer to look down the stack at values away from the top.

We have always seen offsets that are multiples of 8 because we were using 64-bit values, but we could push and pop values of any size if we wanted. e.g. if we pushed three 64-bit values:

stack pointer offsets

Addressing Modes

Let's try with some static memory. Suppose a data segment like this:

    .quad 123
    .quad 6
    .quad 7
    .quad 8
    .quad 9

i.e. something I'll interpret as a 64-bit integer (n) and an array of four 64-bit integers (arr).

Addressing Modes

To refer to the contents of a memory address, wrap the address in parentheses.

    push %rbx
    mov (n), %rdi
    call print_uint64

That's a movq operation: the assembler knows to read 64 bits from memory because of the %rdi destination.

Addressing Modes

We will often need to work with the memory addresses to our data: pointers.

The lea instruction (Load Effective Address) can be used to get a pointer to something in memory: lea is analogous to the & operator in C.

lea n, %rbx        # %rbx == pointer to n
mov (%rbx), %rdi   # (%rbx) == value stored in n
call print_uint64  # prints n

This mov also copies 8 bytes from memory because of the 64-bit %rdi destination.

Addressing Modes

What lea is doing, compared to analogous C code:

mov (n), %rdirdi = n;
lea (n), %rdirdi = &n;
mov (%rsi), %rdirdi = *rsi;
lea (%rsi), %rdi???

Addressing Modes

We can use lea to start working with arrays. We can get the address of the start of an array into a register.

lea arr, %rbx      # %rbx == address of array element 0
mov (%rbx), %rdi
call print_uint64  # prints array element 0

mov %rbx, %rdi
call print_uint64  # prints address of array element 0

Will output something like:


Addressing Modes

We have an array of 64-bit integers (8 bytes for each element). So, we can get from element 0 to element 1 by moving 8 bytes over.

lea arr, %rbx      # %rbx == address of array element 0
add $8, %rbx       # %rbx == address of array element 1
mov (%rbx), %rdi
call print_uint64  # prints array element 1

mov %rbx, %rdi
call print_uint64  # prints address of array element 1

Addressing Modes

What if we want to access the \(n\)-th element? Let's imagine we're using %rcx as a counter and want to access position %rcx in the array.

lea arr, %rbx
mov $2, %rcx       # set our "counter" to 2

shl $3, %rcx       # %rcx *= 8
add %rcx, %rbx     # %rbx += 16
mov (%rbx), %rdi
call print_uint64  # prints array element 2

Addressing Modes

But accessing array elements this way is really common. We don't want to manually calculate the address every time, and the processor will help.

The (address) operand can actually have more parts.

We have been giving the address, but we often want to talk about memory contents relative to that address.

Addressing Modes

We can give a second value, the index which indicates how far to move from the address.

mov (%rbx, %rcx), %rdi

Here, %rdi would get the value from memory location %rbx + %rcx.

That's useful if we're imagining our counter tracking byte offsets (e.g. our loop increment is something like i+=8), but I don't want to track memory addresses.

Addressing Modes

A third component is the scale: how much to multiply the index by. It can only be 1, 2, 4, or 8, but those are often useful. In our array…

mov (%rbx, %rcx, 8), %rdi

will get memory location %rbx + %rcx*8, which is what we want.

Addressing Modes

The full code snippet:

mov $2, %rcx              # set our "counter" to 2
lea arr, %rbx

mov (%rbx, %rcx, 8), %rdi
call print_uint64         # prints array element 2

We don't need to modify %rbx or %rcx like we did in previous examples: we just get the memory access we want right away.

Addressing Modes

There's one more piece we can give in a memory access, the displacement (or offset).

It comes before the (…) and gives a constant value to add to the address. (It must be a literal constant, not a register value):

mov 16(%rbx), %rdi

…refers to memory address %rbx + 16.

This is what we used before around the stack pointer, specifying memory locations like 16(%rsp).

Addressing Modes

So all together, this…

mov 12(%rbx, %rcx, 4), %rdi

references 8 bytes (64 bits, because of the %rdi destination) starting at memory location %rbx + %rcx*4 + 12.

Addressing Modes

The collection of addressing modes might seem crazy, but consider non-crazy C code like this:

typedef struct {
    int32_t a;
    int32_t b;
} pair;

int main() {
    pair* pairs = (pair*)malloc(N * sizeof(pair));
    for (int i = 0; i < N; i++) {
        pairs[i].a = 10;
        pairs[i].b = 11;

Addressing Modes

Let's imagine we use %rbx to hold the pointer to pairs, and %rcx to hold the counter i.

The int32_t are 4 bytes each, so the struct is 8 bytes in total. The i==7 iteration of that loop would effectively be:

    mov $7, %rcx
    lea pairs, %rbx
    movl $10, (%rbx, %rcx, 8)
    movl $11, 4(%rbx, %rcx, 8)

The last instruction sets 32 bits (movl) at mem[pairs + i*8 + 4], which is exactly pairs[i].b.

Addressing Modes

Using mem[x] to refer to memory at location x

Indexed(%rbx, %rdx)mem[%rbx + %rdx]
Scaled(%rbx, %rdx, 8)mem[%rbx + %rdx*8]
Offset4(%rbx)mem[%rbx + 4]
Scaled+Offset4(%rbx, %rdx, 8)mem[%rbx + %rdx*8 + 4]

[Draft: more updates soon.]

Addressing Modes

The given values have to be registers and literal values as in these examples.

displacement(address, index, scale)
e.g. 4(%rbx, %rdx, 8)

The address and index must be a register, the displacement a literal integer, and scale one of 1, 2, 4, 8.

[And cases for labels that we'll discuss more momentarily like (n) and n(%rip).]

Addressing Modes

When we moved around the stack before, we used operands like 16(%rsp) to look 16 bytes from the top of the stack. That was good enough at the time.

pushq $12
pushq $13
pushq $14
mov 16(%rsp), %rdi   # print the 12
call print_uint64

Addressing Modes

Any of the other addressing modes can also be used around the stack pointer.

mov $2, %rcx
mov (%rsp, %rcx, 8), %rdi
call print_uint64    # also prints 12

Relative Addressing

And that whole story works if we link with ld (and its default behaviour).

If we try to link that code with gcc (and its default behaviour), we get a not-very-helpful error message complaining about lines like this:

mov (arr), %rdi

Relative Addressing

mov (arr), %rdi

The problem with that code is that it refers to a literal memory address in the static (.data) segment.

GCC would like to build an executable that can be put anywhere in memory, so the exact address of arr might change.

Relative Addressing

In other words, it's trying to build position independent code (PIC) or a position independent executable (PIE): code that can be loaded anywhere in the computer's memory.

The ld default is non-PIE, so it worked.

Relative Addressing

Position independence is useful for shared libraries (where several will be loaded together with a single program), and for address space layout randomization which is a way to mitigate security problems around memory acccesses.

But we don't really care why. We just want our code to link correctly.

Relative Addressing

mov (arr), %rdi

So, we can't just refer to the literal memory address of one of our labels like that: the address might change.

Relative Addressing

The alternative is to ask the linker to fill in an address relative to the instruction pointer: RIP-relative addressing. The linker can promise that the static memory address will be a certain offset before/​after the current instruction (and figure out what it is).

The way we ask for that is by expressing the label as a displacement from the instruction pointer:

mov arr(%rip), %rdi

Relative Addressing

With that, we can work with static memory (.data and .bss segments) in a way that works everywhere.

lea arr(%rip), %rbx
mov (%rbx), %rdi
call print_uint64    # prints 6

mov 8(%rbx), %rdi
call print_uint64    # prints 7

mov n(%rip), %rdi
call print_uint64    # prints 123

Relative Addressing

The RIP-relative addressing is not relevant for call x instructions where the assembler is doing something more complex and figures it all out. Otherwise, we will probably want to use x(%rip) instead.

Let's revise that table…

Relative Addressing

Indexed(%rbx, %rdx)mem[%rbx + %rdx]
Scaled(%rbx, %rdx, 8)mem[%rbx + %rdx*8]
Offset4(%rbx)mem[%rbx + 4]
Scaled+Offset4(%rbx, %rdx, 8)mem[%rbx + %rdx*8 + 4]
RIP Relativelabel(%rip)mem[label]

Relative Addressing

And most of those are redundant: there's really only one memory access case that can have missing parts.

Register%rax%rax contents
Scaled+Offset4(%rbx, %rdx, 8)mem[%rbx + %rdx*8 + 4]
Jumping/Callinglabeljump to label
RIP Relativelabel(%rip)mem[label]

Relative Addressing

Revising the lea table:

mov n(%rip), %rdirdi = n;
lea n(%rip), %rdirdi = &n;
mov (%rsi), %rdirdi = *rsi;
lea (%rsi), %rdi???

Local Stack Array

What if we want more on the stack? Like a small array?

The challenge: take an array size \(n\) as an argument. Create an array of that many 64-bit integers on the stack. Fill it with \(0\) to \(n-1\), then add them up.

Local Stack Array

I'm going to use %r15 to hold the \(n\) argument so we have it. It's call-preserved, so we have to preserve the caller's value. We're also going to use %rbx, so preserve that too.

Then, we can just subtract from %rsp to get ourselves enough stack space for \(8n\) bytes that we'll think of as the array.

    push %r15
    push %rbx
    mov %rdi, %r15            # %r15 = n
    shl $3, %rdi
    sub %rdi, %rsp            # %rsp -= 8*n

Local Stack Array

I'm going to use %rbx to keep a pointer to the start of the array.

mov %rsp, %rbx            # %rbx = &array

In this function we could use %rsp directly for this, since it will have that value throughout the function. In a function where the stack is used for other stuff, it might be necessary to have a separate pointer, so I'll do it here.

Local Stack Array

I will use %rcx as my loop counter, and the scaled addressing mode to get to element %rcx of the array.

    mov $0, %rcx              # %rcx = i
    mov %rcx, (%rbx, %rcx, 8) # arr[i] = i
    inc %rcx
    cmp %rcx, %r15            # while n > i
    ja fill_loop

Local Stack Array

Then we can loop through the array, adding to an accumulator as we go.

    mov $0, %rcx              # %rcx = i
    mov $0, %rax              # %rax = accumulator
    add (%rbx, %rcx, 8), %rax # acc += arr[i]
    inc %rcx
    cmp %rcx, %r15            # while n > i
    ja sum_loop

Local Stack Array

Finally, clean up: put the stack pointer back to effectively pop the array, then restore the registers we had pushed.

shl $3, %r15
add %r15, %rsp            # %rsp += 8*n
pop %rbx
pop %r15

Local Stack Array

A probably-unreadable comparison of the length of that assembly vs equivalent C:

    push %r15
    push %rbx
    mov %rdi, %r15            # %r15 = n
    shl $3, %rdi
    sub %rdi, %rsp            # %rsp -= 8*n
    mov %rsp, %rbx            # %rbx = &array
    mov $0, %rcx              # %rcx = i
    mov %rcx, (%rbx, %rcx, 8) # arr[i] = i
    inc %rcx
    cmp %rcx, %r15            # while n > i
    ja fill_loop
    mov $0, %rcx              # %rcx = i
    mov $0, %rax              # %rax = accumulator
    add (%rbx, %rcx, 8), %rax # acc += arr[i]
    inc %rcx
    cmp %rcx, %r15            # while n > i
    ja sum_loop
    shl $3, %r15
    add %r15, %rsp            # %rsp += 8*n
    pop %rbx
    pop %r15
int64_t stack_array(uint64_t n) {
	int64_t arr[n];
	uint64_t i = 0;
	do {  // fill the array
		arr[i] = i;
	} while (n > i);
	i = 0;
	int64_t acc = 0;
	do {  // sum the array contents
		acc += arr[i];
	} while (n > i);
	return acc;

Assembly Syntax

As mentioned before, there are two distinct syntaxes for x86 assembly. We're using the AT&T syntax that the GNU Assembler and the Bryant and O'Hallaron text use.

The Intel syntax is more common in general. Notably, it's used by the Intel reference documentation and the NASM assembler.

Assembly Syntax

I think it's worth being able to translate the basic ideas AT&T ↔ Intel, just so you can deal with different documentation.

Assembly Syntax

Biggest difference: in Intel syntax, the destination operand is first, not last like AT&T.

Also there's less punctuation: register names do not have the leading %. These are two equivalent instructions in AT&T and Intel syntax:

mov %rdi, %rax
mov rax, rdi

Assembly Syntax

Similarly, numbers do not have the leading $.

add $7, %rax
add rax, 7

The Intel syntax might make more sense if you understand this instruction as rax += 7, but less if you read it as add 7 to rax.

Assembly Syntax

Addressing memory is more readable in Intel syntax.

mov 4(%rbp, %rcx, 8), %r11
mov r11, [rbp + rcx*8 + 4]

Assembly Syntax

Important note: these are not two different languages, just different ways to express the same thing. Both have the same instructions, addressing modes, etc. Both syntaxes (with appropriate assemblers) produce exactly the same machine code.

Assembly Syntax

There are more differences, but those are enough that you can probably read Intel-syntax assembly code or documentation.

Assembly Syntax

Finally we might be able to read an assembly reference: x86 and amd64 Instruction Reference. Some notations they use:

  • r64: a 64-bit general-purprose register (e.g. %rax).
  • r/m64: a 64-bit general-purprose register (e.g. %rax) or a reference to 64-bits of memory (e.g. (%rax)).
  • imm32: a 32-bit contanst (e.g. $1234)
  • m: a reference to a memory address (e.g. 4(%rbx, %rcx, 8))