Assembly Introduction

We're going to learn assembly.

Why?

Assembly Introduction

Bad reasons to learn assembly:

You think writing assembly is a productive way to get things done, or think you can usually write faster code than a compiler can produce.

Assembly Introduction

Some good reasons to learn assembly:

Understand machine architecture(s): you have a processor that does stuff, but what stuff can it do?
Understand how the machine architecture affects the code you (should) write in a programming language. There are many things you can do right/wrong, and knowing them is often about the machine.

Assembly Introduction

…

Understand what your compiler is/isn't doing, or can/can't do on your behalf. Many of the things people do to make their code faster aren't necessary. Many ways people structure their code are fighting the compiler.
Be able to spot the rare case where implementing some critical logic in assembly can be worth it.

Assembly Introduction

We're learning x86-64 assembly: the assembly code for x86-64 processors, so it will only run there.

There will be details in our code that are Linux-specific [later topic: “calling convention”], so it will only run there.

Assembly Introduction

We're writing x86-64 assembly with the AT&T syntax, not the Intel syntax. The difference is superficial but annoying. The Bryant and O'Hallaron book, and GNU assembler both use AT&T syntax.

But many other tutorials and resources use Intel syntax. Briefly: if you see a lot of punctuation (% and $ everywhere), that's AT&T syntax. [later topic: “Assembly Syntax”]

Assembly Introduction

We will approach the idea of assembly language from both sides and meet in the middle.

We saw the .s files generated by gcc -S: they are some kind of very low-level description of our logic.
The processor has an instruction register. Something from the .o files goes in there (and gets decoded and then drives the processor). [This will be the next slide deck.]

Assembly Introduction

The assembler is translating the assembly code into machine code. Machine code consists of instructions, which are what the processor's instruction register needs.

This translation is much easier than the compiler's job of translating C code (or another language) to assembly.

Assembly Introduction

Consider this instruction that means move (copy) the contents of register rdi to rax.

mov %rdi, %rax

It gets translated to these three bytes, represented in binary (base 2) or hexadecimal (base 16):

01001000 10001001 11111000

48 89 F8

Assembly Introduction

mov %rdi, %rax

48 89 F8

[I'm 90% sure…] The 89 byte encodes move (mov) a register to another register. The F8 byte encodes rdi to rax. The other byte is specifying the register sizes (64-bit each) and other details.

Assembly Introduction

There's a little more to the assembler's job, but the basics are just: translate the instructions (written by the programmer or compiler) to bytes.

Those bytes can be sent to the processor's instruction register, and the processor does the thing.

Assembly Introduction

Now, all we have to do is write some assembly code to see it work…

Our First Assembly

I'm going to write a function add10 in assembly and call it from C. We'll need a C header file so the C code knows about the argument and return types. In add10.h:

#include <stdint.h>
int64_t add10(int64_t n);

The goal: a function written in assembly that will return n+10.

Our First Assembly

And our C program:

#include <stdint.h>
#include <stdio.h>
#include "add10.h"

int main() {
    int64_t n = 1234;
    int64_t m = add10(n);
    printf("%ld + 10 = %ld\n", n, m);
}

Now, if we can get a .o file containing an appropriate add10 function, things should work.

Our First Assembly

I'm going create add10.S with the assembly code. Files named .S are run through the C preprocessor, but .s files are not (like .c vs .i).

I'm going to adopt the convention that .S files contain hand-written assembly (as opposed to compiler-generated assembly).

Our First Assembly

There will be a few lines above this, but this is the add10 function:

add10:
    mov %rdi, %rax
    add $10, %rax
    ret

Here, add10 is a label: a marker for a location in memory. The memory location of the next instruction (the mov) can be referred to as add10.

Our First Assembly

mov %rdi, %rax

This is one instruction: the mov operation. It has two operands, %rdi and %rax. The destination is always the last operand (in AT&T syntax).

The mov is the move instruction that copies the value from its source operand (%rdi here) to the destination (%rax).

Our First Assembly

mov %rdi, %rax

Both %rdi and %rax are x86-64 registers. A register is a storage location inside the processor: large enough to hold one value (64 bits in this case), fast. There are only a few registers to work with.

This instruction copies the contents of %rdi to %rax.

Our First Assembly

add $10, %rax

The add instruction adds one integer to another. As before, the last operand is the destination: %rax.

The first operand is the value we're adding to %rax. In this case, a literal integer 10. In C syntax, the operation we're doing is rax += 10.

Note: register names are prefixed with % and literal numeric values prefixed with $.

Our First Assembly

Then, return from the function.

ret

We don't yet know why or how, but if this works (and it will), we somehow were given the (single) argument to the function in the %rdi register, and had to put the return value in %rax before returning.

Our First Assembly

There is a little more housekeeping to do at the start of the .S file. Complete contents:

    .section .note.GNU-stack, ""
    .global add10
    .text

add10:
    mov %rdi, %rax
    add $10, %rax
    ret

The .global: export this label so the linker can see it, and .text marks the start of a code section [later topic: “assembly code sections”].

[The .note.GNU-stack line says we won't be putting executble code on the stack. It's a security problem if code tries to run there.]

Our First Assembly

We already know how to compile the .c code to an object file. We have seen how to invoke the GNU assembler to create an object file: here the code is coming from us, not the compiler (and we'll add --warn).

gcc -Wall -c add10_test.c       # compile to add10_test.o
as --warn add10.S -o add10.o    # assemble to add10.o
gcc add10.o add10_test.o -o add10_test  # link

Our First Assembly

Now we have an actual executable that we can run and get the output we expect:

./add10_test

1234 + 10 = 1244

We created a function in assembly, assembled it, and called it from C. 🎉

Our First Assembly

Summary:

I'll use .S files for hand-written assembly and .s for compiler-generated assembly.
foo: is a label for a memory location.
The destination is the last operand.
% indicates a register.
$ indicates a literal value.
First function argument in %rdi; return value in %rax.

Our First Assembly

Things unsaid:

Assembly code doesn't know our types: we imagine there's one argument that's a 64-bit integer and the return type is a 64-bit integer. The assembler has no idea, so it can't check we got the types/arguments right.
In C, we think of add10 as a function. In assembler, it's just a memory location. We need to ret when we're done, but the compiler can't check that either.

Our First Assembly

An alternate implementation of add10:

add10:
    add $10, %rdi
    mov %rdi, %rax
    ret

This code does the calculation in %rdi and copies the result to %rax (instead of copying the argument and calculating in %rax). Both implementations are functionally equivalent.

Calling Convention

Assembly functions can take arguments and returned results. How do the arguments come in, and the return value go out?

Calling Convention

The way we implement arguments/return is determined by the calling convention, specifically the System V AMD64 ABI that's used in Linux and MacOS (but not Windows).

A calling convention specifies how functions receive their arguments, give their return value, deal with local variables, etc.

Calling Convention

Our calling convention specifies things like (incomplete list):

Argument 1 in %rdi.
Argument 2 in %rsi.
Argument 3 in %rdx.
Return value in %rax.
%rbp must be unchanged when a function returns: the value there are the start of the function must be there at the end.
%rcx may be modified during a function call: it may be different when the function returns.

Calling Convention

Having a calling convention lets different compilers/tools interoperate with each other. e.g. we were able to write a function in assembly and call it from C code because we followed the same calling convention as the C compiler.

In theory, we could choose any combination of registers for arguments and return values. In practice, we need to do it the same as everybody else.

Calling Convention

Integer arguments to functions are passed in these registers (in this order): %rdi, %rsi, %rdx, %rcx, %r8. and %r9.

Floating point arguments are passed in the SSE registers: %xmm0 to %xmm7.

If there are more arguments than that, they go on the stack.

[later topics: “the registers”, “floating point”, “the stack”]

Calling Convention

Before a function returns, any integer return value must be put in %rax; floating point return in %xmm0.

Calling Convention

A more interesting aspect of the calling convention: preserved registers.

There is no equivalent of local variables for the registers. If we call a function, that function uses the same registers as our code. How can our code maintain any data across a function call?

Calling Convention

Some registers must be preserved as part of a function call. That is, a function must guarantee that the register value will be unchanged after it's done.

Functions are allowed to use those registers, but the original values must be restored if they do.

Calling Convention

These registers are preserved across function calls: %rbx, %rsp, %r12, %r13, %r14, %r15.

Other than those, you have to assume that any function call will destroy values you have in registers.

Calling Convention

You may see the terminology caller-saved and callee-saved for these two categories of registers. Those are annoyingly similar and hard to parse.

I will say preserved and not preserved with this StackOverflow answer explaining why.

The Registers

We have been using registers and have some idea what they are: a small number of very fast storage locations in the processor.

The x86-64 registers are a mess, mostly for historical reasons. A lot of them have names because of ways they were originally intended to be used, but they're just general purpose places to put 64-bit values.

The Registers

These are not call-preserved:

Register	Name	Use
`%rax`	accumulator	return *
`%rcx`	counter	arg4 *
`%rdx`	data	arg3 *
`%rsi`	source	arg2 *
`%rdi`	destination	arg1 *
`%r8`		arg5 *
`%r9`		arg6 *
`%r10`		anything
`%r11`		anything

The Registers

These are call-preserved:

Register	Name	Use
`%rbx`	base	anything
`%r12`		anything
`%r13`		anything
`%r14`		anything
`%r15`		anything

If you use these in a function, you must store/restore their values.

The Registers

There are also several registers that have specific purposes and hold values interpreted in specific ways. For now, leave these alone.

Register	Name
`%rsp`	stack pointer
`%rbp`	base pointer
`%rip`	instruction pointer

%rip can only be manipulated by jump/branch instructions. [later topic: “branching”]

The Registers

There are also separate registers used when working with floating point values: %xmm0 to %xmm15. None of them are call-preserved.

The Registers

There are also names for smaller fragments of each register. For example, %rax is a 64-bit register but….

%eax refers to the lower 32-bits of %rax,
%ax refers to the lower 16-bits of %rax,
%al refers to the lower 8-bits of %rax,
%ah refers to the next 8-bits of %rax.

The Registers

Or more visually, there's a single 64-bit register in the processor that you can refer to in these pieces:

Here, all of %rax, %eax, %ax, %al hold the integer 1.

The Registers

Even though %rdx and %dl look like different register names, writing to one changes the value in the other (but %rdi is a completely different register, even though it has a d in the name).

Another Example

Mostly because I want to write some more assembly, let's work through an example of preserving a register inside a function.

I'm going to write a function in assembly that uses %rbx as a temporary variable: %rbx is call-preserved, so I have to restore it to its original value before I return.

Another Example

I need to store the original value of %rbx somewhere. I'm going to store it on the stack.

Short user's manual for the stack before we really talk about it: push to put a value onto the stack, and pop to get it back later. You must pop values in the opposite order to pushing them.

Another Example

And I'm going to write assembly that can run as the main program, not called from C. I'm going to use helpers.c that contains some useful functions.

gcc -c helpers.c -o helpers.o
as --warn preserve.S -o preserve.o
ld helpers.o preserve.o -o preserve
./preserve

[You can call any C function from assembly, but the provided helpers.c code avoids using the C standard library, so it's simple to work with, link, etc.

Another Example

The main will call my (two-argument) function, print its return value, and exit (i.e. stop the program).

    .section .note.GNU-stack,""
    .global _start
    .text
mult_and_sub:
    # TODO
_start:
    mov $5, %rdi
    mov $7, %rsi
    call mult_and_sub
    
    mov %rax, %rdi
    call print_int64    # provided in helpers.c
    
    mov $0, %rdi
    call syscall_exit   # provided in helpers.c

Another Example

Since we're using the call-preserved %rbx, we need to store it temporarily on the stack, so our function is going to look like:

mult_and_sub:
    push %rbx  # put the outside code's %rbx on the stack
    ⋮
    pop %rbx   # restore original %rbx from the stack
    ret

Another Example

Now the actual function. I want it equivalent to this C:

int64_t mult_and_sub(int64_t a, int64_t b) {
    return (a*b) + (a&b) + (a-b)
}

[The & is bitwise-AND: logical AND of each of the 64 bits.]

Another Example

The whole function to get there:

mult_and_sub:  # return (a*b) + (a&b) + (a-b)
    push %rbx
    # a*b
    mov %rdi, %rax   # start the sum in %rax
    imul %rsi, %rax  # ... with a*b
    # ... + a&b
    mov %rdi, %rbx   # use %rbx to calculate a&b
    and %rsi, %rbx
    add %rbx, %rax   # add it to %rax
    # ... + a-b
    sub %rsi, %rdi   # don't need %rdi again, so can modify
    add %rdi, %rax   # add a-b to %rax
    pop %rbx
    ret

Another Example

That logic rewritten in C would be:

int64_t mult_and_sub(int64_t a, int64_t b) {
    int64_t rdi, rsi, rax, rbx;
    rdi = a;    // because of the calling convention
    rsi = b;    // because of the calling convention
    rax = rdi;
    rax *= rsi;
    rbx = rdi;
    rbx &= rsi;
    rax += rbx;
    rdi -= rsi;
    rax += rdi;
    return rax; // because of the calling convention
}

Another Example

The register names kind of all look the same: I tend to include a comment like this to remind myself which register is doing what.

# Return (a*b) + (a&b) + (a-b)
# %rdi = a
# %rsi = b
# %rbx = tmp
# %rax = result

If I was writing this code myself (not as a lecture example), I would still comment almost as in the file. Assembly is hard to read, and comments shouldn't be considered optional.

Another Example

The point of the register preserving (with push and pop here) is that any calling code must be able to rely on %rbx (and other preserved registers) being unchanged after calling any function.

    mov $1234, %rbx
    mov $5, %rdi
    mov $7, %rsi
    call mult_and_sub
    
    mov %rax, %rdi
    call print_int64  # prints result of mult_and_sub
    mov %rbx, %rdi
    call print_int64  # must print 1234

Another Example

Assembly feels very manual: you're in charge of every single step, and every single detail. It's your job to keep track of number of arguments, types, preserving the right registers, etc.

Also note, these instructions (add, sub, imul) do integer arithmetic. We don't have any tools to work with floating point values (yet).