Almost everything we have talked about in a processor
has all been within a core.
Each core is effectively an independent processor with its own execution units, registers, L1/L2 cache, etc.
Your computer's processor probably has four or more cores.
When your program runs, that's a process. A process has some memory allocated to it (including your code in the static part of its memory).
Your computer can be running multiple processes concurrently. The operating system deals with allocating resources to processes (memory, processor time, etc).
Each process has one or more threads…
A thread is a point where code is running within a process. A thread has its own registers and stack, but all threads within a process share the same memory.
Most notably, each thread has its own instruction pointer: they can all be executing at different places within the program.
The operating system takes care of scheduling threads and making sure they all get processor time. It's in charge of running processes/threads on cores, letting them take turns with other threads, etc.
… if they're kernel threads
or OS-level threads
.
User threads
or userspace threads
or green threads
are handled within a single kernel thread, and are managed by some runtime environment provided by the language/library.
Something I can't say often enough: threads don't have to be hard. I want to use several threads in my code when doing a lot of computation.
I paid for those cores. I want to use them. Remember that each core is probably hyperthreaded: can run two threads at once, getting somewhere between one and two full cores
worth of work done on them.
If you want to take advantage of multiple cores, you need either multiple processes or multiple threads.
In general threaded code is difficult to write correctly (e.g. in CMPT 300). It's hard to test and debug. Interactions between threads can be unpredictable because each could start/stop at any point because the OS decides to.
But, you don't have to do the hardest thing every time you use threads.
Thread safety is hard because a thread could be paused when it's in the middle of updating a data structure. Suppose you have multiple threads doing this on the same collection
:
unsigned len = collection.size(); collection[len] = new_data; collection.set_size(len + 1);
Thread #1 might be paused after the first line; thread #2 might do this whole fragment; then #1 resumes, overwriting the value #2 just wrote.
So don't do that.
Threads are easy if you don't share any data between them. Threads are easy if you share data between them and don't modify it.
Multiple threads are hard if you are sharing data structures between them and modifying them. As soon as you start doing that, things are tricky.
Multiple threads are easy if you can guarantee that for each value/object/data structure/whatever either:
If that's true, go for it: use threads all you want. (Single-threaded code is trivially case #1.)
If you need to modify something that's shared between threads, that's when you have to be careful.
One additional case where you can easily share something changeable:
std::sync::mpsc
;
Python's multiprocessing.Queue
;
Java's BlockingQueue
.There are several tools in the C++ standard library that make it fairly straightforward to work with threads (but not a thread-safe channel, as far as I can see).
For example, std::thread
. Its constructor takes a function and its arguments. Then that function is executed in a separate thread. e.g. this simple function:
void say_hello(int id) { cout << "Hello from thread " << id << '\n'; }
We can start three of those in threads like this:
auto t1 = std::thread(say_hello, 1); auto t2 = std::thread(say_hello, 2); auto t3 = std::thread(say_hello, 3);
And then wait for them to finish:
t1.join(); t2.join(); t3.join();
What I got from a run of that code:
Hello from thread Hello from thread 1 Hello from thread 3 2
Remember: threads can start/stop at any moment. We could think of these as multiple threads that modify
the output and thus break our rules.
Let's try something else with std::async
(added in
That will be easy to work with. If our function gets no input besides its arguments and does nothing besides calculate and return a result, it will be easy to meet the requirement for nothing shared and changing
.
In other words, a pure function. Here's one:
int do_work(int a, int b) { return a + b; }
Note: this function is much too small to sensibly call in a separate thread. Creating and destroying the thread will take many times more work than the addition, but it is good enough for an example.
Now we can call it with async
and get back a std::future
. Basically, a result that will be ready some time in the future.
std::future<int> f1 = std::async(do_work, 5, 6); std::future<int> f2 = std::async(do_work, 7, 8);
And maybe do some other stuff in the main thread, but eventually wait for their results and use them:
int total = f1.get() + f2.get();
That's how easy threads are, as long as you're not sharing mutable data structures.
We can easily break our array sum into two halves and do each in a different thread:
float array_sum_threaded(float* array, uint64_t length) { uint64_t half = length / 2; auto f1 = std::async(array_sum, array, half); auto f2 = std::async(array_sum, array+half, length-half); return f1.get() + f2.get(); }
Here, array
is shared but not modified so we're safe. The local variables in array_sum
are not shared: each thread has its own stack, so its own local variables.
The speedup was about 1.7×, not the 2× I was hoping for.
Maybe speed was limited by memory bandwidth. Maybe starting/stopping the threads took longer than I expected. Maybe something else.
The message: using threads isn't as easy as not using them. But, you can use them in a way that's not that hard.
The hard part is actually deciding when the amount of work you have to do is big enough to be worth doing in another thread.
Use the cores you paid for.