Concurrency is hard. Everybody says so.
We have seen previously in this course: concurrent/parallel programming isn't that hard if you don't have data structures (state) that are both shared and mutable.
We saw concurrency being easier in Haskell: no state, so no shared state. But lazy evaluation had to be defeated so multiple things could be computed concurrently.
We have already seen in Rust: values have a unique owner, or immutable reference(s), or a unique mutable reference.
The language's type system already ensures no shared mutable state *.
Easy. Concurrency will be safe in Rust.
Initially, the Rust team thought that ensuring memory safety and preventing concurrency problems were two separate challenges to be solved with different methods. Over time, the team discovered that the ownership and type systems are a powerful set of tools to help manage memory safety and concurrency problems! The Rust Book, Fearless Concurrency
The basic tool for concurrency in Rust is threads. These are OS-level threads: created and scheduled by the operating system.
We can create threads with std::thread::spawn
: it takes a function (FnOnce
) that will be run in a new thread, and returns a JoinHandle
.
For example: (Recall that
is a closure. |x| x+1
is a zero-argument closure.)|| {…}
use std::thread; let handle = thread::spawn(|| { println!("I am in a new thread."); }); println!("I am in the main thread."); let res = handle.join(); println!("The new thread is done. {:?}", res);
The order of the first two lines of output is undefined:
I am in the main thread. I am in a new thread. The new thread is done. Ok(())
Once we have a JoinHandle
, the thread is running. Calling JoinHandle.join()
waits for the thread to finish.
It returns an Result<T, E>
where T
is whatever the thread function returns: Ok(T)
usually; an Err
if the thread panics.
If we don't .join()
the thread, it will keep running. So, it's possible to leak threads if we drop the JoinHandle
without joining.
Since the functions we spawn can return a value and we can get it with .join
, it's easy to send some work into threads, let the threads work in parallel, and get the results back.
let h1 = thread::spawn(|| 1 + 1); let h2 = thread::spawn(|| String::from("Hello") + " world"); println!("Both threads are now working."); println!("h1 result: {:?}", h1.join().unwrap()); println!("h2 result: {:?}", h2.join().unwrap());
Both threads are now working. h1 result: 2 h2 result: "Hello world"
If we start using the thread functions as closures, we have questions of ownership again.
let data = Vec::from([1, 2, 3, 4, 5, 6, 7]); let h = thread::spawn(|| { println!("The third element is {}", data[3]); }); h.join().unwrap();
Compilation fails:
closure may outlive the current function, but it borrows `data`, which is owned by the current function
The complaint is that the closure borrows the vector (gets a reference to it, because Vec
is not Copy
), but the compiler can't prove that the reference will always be valid. Solution: ask for a closure that takes ownership of the values it uses.
We can say we want to take ownership of closed values with
.move |…| {…}
We can use this to give ownership of data
to the thread.
let data = Vec::from([0, 1, 2, 3, 4, 5, 6, 7]); let h = thread::spawn(move || { println!("The third element is {}", data[3]); }); h.join().unwrap();
The third element is 3
If we try to use data
here outside the thread, it would fail because we try to access a moved value.
If we aren't communicating by using shared data structures, how can threads (safely) communicate with each other?
With some kind of thread-safe message passing. In the Rust standard library, we get std::sync::mpsc
: a Multiple-Producer, Single-Consumer message queue.
Constructing an mpsc
queue (with mpsc::channel()
) gives us a Sender<T>
(which can be cloned so several threads can send) and a Receiver<T>
(which is unique: it's neither Clone
nor Copy
).
Then the (possibly one, possibly many) Sender
s can send whatever data safely to whichever thread has the Receiver
.
use std::sync::mpsc; let (sender, receiver) = mpsc::channel(); for i in 0..3 { let snd = sender.clone(); thread::spawn(move || { snd.send(i * 10).unwrap(); }); } for i in 0..3 { println!("{:?}", receiver.recv()); }
Ok(0) Ok(20) Ok(10)
Message passing with mpsc
or similar (we will use spmc
in assignment 3) is often enough for all the communication needed between threads. We can certainly have memory-safe threads if we have no shared memory.
But what if we need to share some large immutable data structure? Or a shared mutable data structure?
The answer is going to have to wait.
There are also some traits important to multi-threaded code that it might be important to note on values you're hoping to share around threads.
The Send
trait marks a value that Rust can transfer ownership to another thread. Most types are Send
.
Since this code worked,
for i in 0..3 { let snd = sender.clone(); thread::spawn(move || { snd.send(i * 10).unwrap(); }); }
… we infer that i32
and Sender
(the types of i
and snd
) must be Send
.
The other trait we might need to know about is Sync
which indicates a type where it's safe to share references between multiple threads. Note: they must be immutable references because we already know multiple mutable references can't exist.
Safely implementing Sync
is harder, but still there for many types.
Some types allow interior mutability
where you can have an immutable reference, but do mutable things to contents. Those either don't implement Sync
or lock/unlock as necessary.
Concurrent (but not parallel) code can be created with asynchronous programming (but I won't say more here).
There are also many third-party crates that help with parallel code.