What do we mean by a
computer? There are a bunch of parts that we know are there, but need some explanation.
What is in there, and why is it the way it is?
At the lowest level, a computer is just a bunch of electronics. There are wires, and electric currents in the wires.
Somehow, computer engineers make those currents do useful things, and we get a computer. What's really going on in a computer? That's their problem.
But it's our problem too. *
There are a lot of layers of abstraction between your C code and the circuit, but all abstractions are leaky and sometimes we need to know what's going on in there, at least a little.
My goal for the rest of this slide deck is not to teach you how to build digital circuits. If you want to learn that, I'm not the instructor you need.
My goal is: convince you it's possible to build a computer with circuitry. This will guide us toward some of the reasons computers are the way they are.
It's also a way to show what some terms really mean and why we use them.
Also: if you're looking for accurate information about digital circuits, look elsewhere. Everything here is simplified the maximium amount possible, and then a little more.
i.e. this slide deck contains useful oversimplifications.
First leaky abstraction: physics is cruel. We can design a circuit to send five volts down a wire, but it won't really be exactly 5.000000 V. It might be 4.9 V one moment and 5.1 V the next.
So if we try to imagine some circuit where we represent numbers 0–50 as voltages 0.0 V, 0.1 V, 0.2 V, …, 5.0 V, we're very quickly start having errors because the voltages aren't as precise as we'd like.
The thing that's actually practical to build is a binary circuit where low/high voltages represent 0/1.
So because physics and engineering are what they are, we're going to have circuits that manipulate only two distinct values that we'll call 0 and 1.
Each is a bit. 0 is a bit. 1 is a bit. 0110 is four bits.
A single bit is so small it's hard to talk about: usually we'll use groups of many bits. First step: a collection of eight bits is a byte. 01101100 is a byte.
bit is abbreviated
byte is abbreviated
B, so 1 B = 8 b. (Not everybody does this consistently, but I will.)
Transistors are where digital circuits start: a circuit that essentially switches a current on/off.
The idea: if there's a (small) current on B, connect C to E. Otherwise, C and E aren't connected. [That's a
bipolar junction transistor: there are others.]
They can be big or small. Four transistors:
Or extremely small. 1.4 billion transistors:
Once we have transistors, it's possible to combine them to build useful components like an AND gate, represented like this (inputs A and B, output C):
The AND gate's job: output a 1 only when both inputs are 1.
An AND gate is built with transistors like this:
We can give a complete description of the AND gate's behaviour:
The OR gate outputs 1 when at least one of the inputs is 1:
The XOR gate (exclusive-or) outputs 1 when exactly one of the inputs is 1:
The NOT gate flips its input:
Combinatorial logic: a logic function where the output depends only on the current inputs. i.e. there is no
memory, just inputs being mapped to outputs.
Useful fact: using these gates (AND, OR, XOR, NOT), it's possible to build a circuit to implement any combinatorial logic.
It's possible to connect these simple logic gates to do useful stuff. We will come back some things that would be
useful later, but mostly that's another course.
But I want to skim through an example to prove that it's possible: the Simple CPU project.
The first sections on that page describe how to build…
multiplexerwhich can select between multiple inputs: if select==0 output A; if select==1 output B.
ripple adderthat can add integers (assuming we have some way to represent integers with bits, which we haven't talked about yet).
These are both combinatorial logic problems, so they're definitely possible to construct.
Then it uses those to build a circuit that gets two 8-bit inputs A and B and calculates all of A+B, A-B, A&B (bitwise AND), and several others. i.e. some circuit that does this (the
\8 indicates 8 wires carrying 8 bits):
Hook a multiplexer up to the output of that: the (five-selector input) multiplexer selects which calculation is the actual
output of the circuit. That's our ALU, arithmetic logic unit, usually drawn like this:
The the circuitry described on that page, if we set the S (select) bits to 00000, the ALU outputs A+B.
If the S bits are 10100, it outputs A+1.
If the S bits are 01000, it outputs A-B-1.
It starts to look processor-like, but I still wouldn't call it a
processor. Where is our data coming from? Where is it going to?
If we're going to store bits, we need a flip-flop: a circuit that can store a single bit.
The idea: we can set the output (Q) to the input (D) but only at the moment the clock signal (the triangle) changes from 1 to 0 (the falling edge).
|(anything else)||no change|
Physics interrupts us again: these circuits are physical things and it takes time for electrons to bounce through them. Our circuits need time from when the inputs are applied until the outputs are ready.
A clock provides a regular pulse (alternating 0, 1, 0, 1, 0, 1, …) that is used to synchronize the circuit. A clock cycle needs to be long enough for the circuit to
finish whatever it's doing and have correct outputs.
The clock signal can be used to make sure we store the correct results in our flip-flops.
We could combine several flip flops to store several bits and call it a register. Let's have 8-bit registers to match the ALU inputs.
And we could connect the output of our ALU to the input of a register, and store the result of a calculation.
The idea: get values we want into the A and B registers and set S to select the operation we want (e.g. addition). Get the
next value for B on its input. Wait until the falling clock: then A get the result of the computation and B get its new value. Change S to something else to do the next calculation. Repeat.
We don't want to overwrite the register in every clock cycle: only when there's a result we need to store.
We need to figure out the rest of the inputs. But then maybe we could think of this as a processor.
There was a choice of the number of bits in each register (and corresponding size of ALU inputs/outputs). That choice determines the amount of data you most easily work with in the CPU. This is the word size.
The x84-64 architecture (everything after ∼2003: Athlon 64 and Pentium 4) has a 64-bit word size. So do most other modern CPUs (modern ARM, RISC-V).
Its predecessor, IA-32 (∼1985: i386), had 32-bit words. The earliest x86 CPUs (8086) had 16-bit words.
simple CPU example and earlier CPUs had 8-bit words.
Generally, modern processors can do calculations on values smaller than their word size: smaller values take less memory (when they're in memory, and the operations may be faster.
add %rcx, %rax # 64-bit add add %ecx, %eax # 32-bit add add %cx, %ax # 16-bit add add %cl, %al # 8-bit add
And you can do calculations on larger values by writing multiple instructions that do it.
We need to figure out the S inputs to the ALU: the ones that decide which calculation it's going to do.
There will also be other control inputs on other circuits. They will also need to be set correctly to do whatever operation the programmer wants.
Each operation the processor is going to do is called an instruction. That's what the programmer (or compiler) will create.
But how does the programmer's instruction (
I want to add X and Y) make the processor actually do stuff?
First we need to get the instruction into the processor. It will be stored in another register: the instruction register or IR. In the example simple CPU, instructions are 16 bits, so the IR is a 16 bit register.
Another circuit will take the instruction and turn it into control signals that can make the rest of the circuit do what we want.
e.g. five of those control signals will connect to the ALU's S input. If the instruction indicates
calculate A+1, the ALU's S input needs to be 10100, so the decode logic needs to produce those values.
The instructions that are destined for the IR have to come from somewhere. So does the data we put in the A and B registers. Also we need to store more data than the registers can hold.
The computer's memory will be kind of like registers: some kind of circuit that can store bits until we need them.
But registers are too expensive to have too many of them. We want lots of memory (relative to the number of registers).
The memory in modern computers is DRAM. The details are the engineers' problem, but each bit is stored in a capacitor.
Capacitor: think of a tiny rechargable battery. Discharged = 0, charged = 1.
Like a rechargable battery, a capacitor will discharge over time. They must be periodically
refreshed or they will lose their data. The circuit takes care of that.
Our memory circuit will be something like this:
To write, set WE (write-enable) to 1 and the data at location ADDRESS will be changed to the bits from DATA_IN.
To read, set WE to 0 and the ADDRESS input to which location from memory you want. The memory contents will be on DATA_OUT.
We had the choice of how many bits are in the address: 8 will limit us to \(2^8 = 256\) memory locations.
We can also choose the number of bits in the data in/out connections. This example chose 16, so a whole instruction can be read at once.
Note that the data connections are 16 bits. Every memory read/write in this architecture will be for the byte at memory location ADDRESS and also ADDRESS+1.
That's probably not what you want in a
Along with the data we're working with, the program we're running will also be stored in memory: a Von Neumann architecture.
Our memory is doing a lot: storing both instructions and data. We at least need to read an instruction before executing it, and executing it might involve reading/writing data too.
It sounds like reading/writing memory might be what limits performance. This is the von Neumann bottleneck. It's even worse than it sounds: modern memory is many times slower than the processor clock cycle. More later [later topic: “memory hierarchy”].
We will need circuitry that can decide what to do with the memory address that's being read: it might be destined for the IR or A or B register: some kind of big multiplexer with appropriate control inputs?
The memory's DATA_OUT will have to connect to those (perhaps multiplexed with other input values), and each be triggered to store the value when appropriate.
Similarly, we might be writing the result of a calculation back to memory: then something needs to trigger the memory WE (write enable) signal, get the data to DATA_IN and memory location to ADDRESS.
For our purposes, we'll simply say those can be built, and the control signals coming from the decoder will take care of triggering all the right behaviour.
Let's talk more about what has to happen to get an instruction running (in our
simple CPU example or any other processor).
To get an instruction into the instruction register, it must be read from memory. But which memory location?
We need another register for that: the instruction pointer or IP (or program counter, PC) will hold the memory address of the next instruction we want to execute.
Every time the processor starts an instruction, the IP will be incremented so it points at the next instruction. (In this example,
IP+=2 because an instruction is two bytes.)
So, to complete an instruction, the processor must:
We will also need some circuitry to manage this process (and others in the processor): the control logic.
This fetch-decode-execute process is called the instruction cycle.
As with everything else, modern processors make this much more complex, but the general idea is still in there.
The example simple CPU does each of these steps in a separate cycle, plus another to increment the IP. So, every instruction takes four cycles: fetch, decode, execute, increment.
This is also much more complex in a real CPU.
The Simple CPU project that I have been describing is an extremely simple processor design, and I left out a lot of details.
I hope you're convinced it could be built. If we spent a few more lectures dealing with the details, I'm confident you could understand all of them, but it's not necessary for this course.
If we spent a bunch more time adding detail, we could perhaps see something like a 6502 that was in the Apple II, Atari 5200, Commodore 64, and many other early-1980s computers.
The block diagram of the 6502 isn't so different than what was described in this lecture.
The 6502 has ∼60 instructions the programmer (or compiler) can use.
Each one takes a fixed number of clock cycles ≥2 (roughly, 2 cycles + time needed to get the operands if they're in memory).
Then we'd be right up to date, as of the late 1970s.
Now imagine 40 or 50 years of complexity piled on top of those ideas. That's what's in your computer.
modern processor, I'm going to generally think of something like a Haswell CPU (∼2013, Intel Core 4xxx). It has all of the functionality we could possibly need for this course (as does any successor or AMD Ryzen series).
block diagram in the WikiChip Haswell page. It's a lot more complex, but very roughly speaking,
This describes a single core. A
four core CPU has four of these.
A Haswell processor has ∼822 instructions, depending how you count
The number of cycles each one takes is much more complicated and can depend on many factors (some of which we'll discuss later).
Hopefully this topic gave you a better understanding of things we need to know about: