Concurrency in Rust

We have seen earlier in this course: concurrent/parallel programming isn't that hard if you don't have data structures (state) that are both shared and mutable.

We saw concurrency being easier in Haskell: no state, so no shared state. But lazy evaluation had to be defeated so multiple things could be computed concurrently.

Concurrency in Rust

We have already seen in Rust: values have a unique owner, or immutable reference(s), or a unique mutable reference.

The language's type system already ensures no shared mutable state *.

Concurrency in Rust

Easy. Concurrency will be safe in Rust.

Initially, the Rust team thought that ensuring memory safety and preventing concurrency problems were two separate challenges to be solved with different methods. Over time, the team discovered that the ownership and type systems are a powerful set of tools to help manage memory safety and concurrency problems! The Rust Book, Fearless Concurrency

Threads

The basic tool for concurrency in Rust is threads. These are OS-level threads: created and scheduled by the operating system.

We can create threads with std::thread::spawn: it takes a function (FnOnce) that will be run in a new thread, and returns a JoinHandle.

Threads

For example: (Recall that |x| x+1 is a closure. || {…} is a zero-argument closure.)

use std::thread;
let handle = thread::spawn(|| {
    println!("I am in a new thread.");
});
println!("I am in the main thread.");
let res = handle.join();
println!("The new thread is done. {:?}", res);

The order of the first two lines of output is undefined:

I am in the main thread.
I am in a new thread.
The new thread is done. Ok(())

Threads

Once we have a JoinHandle, the thread is running. Calling JoinHandle.join() waits for the thread to finish.

It returns an Result<T, E> where T is whatever the thread function returns: Ok(T) usually; an Err if the thread panics.

If we don't .join() the thread, it will keep running. So, it's possible to leak threads if we drop the JoinHandle without joining.

Threads

Since the functions we spawn can return a value and we can get it with .join, it's easy to send some work into threads, let the threads work in parallel, and get the results back.

let h1 = thread::spawn(|| 1 + 1);
let h2 = thread::spawn(|| String::from("Hello") + " world");
println!("Both threads are now working.");
println!("h1 result: {:?}", h1.join().unwrap());
println!("h2 result: {:?}", h2.join().unwrap());

Both threads are now working.
h1 result: 2
h2 result: "Hello world"

Threads

If we start using the thread functions as closures, we have questions of ownership again.

let data = Vec::from([1, 2, 3, 4, 5, 6, 7]);
let h = thread::spawn(|| {
    println!("The third element is {}", data[3]);
});
h.join().unwrap();

Compilation fails:

closure may outlive the current function, but it borrows
`data`, which is owned by the current function

Threads

The complaint is that the closure borrows the vector (gets a reference to it, because Vec is not Copy), but the compiler can't prove that the reference will always be valid. Solution: ask for a closure that takes ownership of the values it uses.

We can say we want to take ownership of closed values with move |…| {…}.

Threads

Using move here gives ownership of data to the thread.

let data = Vec::from([0, 1, 2, 3, 4, 5, 6, 7]);
let h = thread::spawn(move || {
    println!("The third element is {}", data[3]);
});
h.join().unwrap();

The third element is 3

If we try to use data here outside the thread, it would fail because we try to access a moved value.

Threads

Aside: if we are using move to take ownership of closed values or taking ownership of arguments, we cannot call that function more than once. First call: take ownership of the values. Second call: voilate ownership rules.

But no problem since the argument to thread::spawn implements the FnOnce trait, everybody understands that it's a call only once situation. The compiler will enforce that restriction.

Messages

If we aren't communicating by using shared data structures, how can threads (safely) communicate with each other?

With some kind of thread-safe message passing. In the Rust standard library, we get std::sync::mpsc: a Multiple-Producer, Single-Consumer message queue.

(An mpmc multi-producer, multi-consumer queue is currently experimental.)

Messages

Constructing an mpsc queue (with mpsc::channel()) gives us a Sender<T> (which can be cloned so several threads can send) and a Receiver<T> (which is unique: it's neither Clone nor Copy).

Then the (possibly one, possibly many) Senders can send whatever data safely to whichever thread has the Receiver.

Messages

use std::sync::mpsc;
let (sender, receiver) = mpsc::channel();
for i in 0..3 {
    let snd = sender.clone();
    thread::spawn(move || {
        snd.send(i * 10).unwrap();
    });
}
for i in 0..3 {
    println!("{:?}", receiver.recv());
}

Ok(0)
Ok(20)
Ok(10)

[Notes: the channel type T is inferred to be i32; Sender is cloned for each thread; i and snd are both captured by the closure; snd is moved but i is Copy so copied not moved.]

Messages

Message passing with mpsc or similar (we will use crossbeam's channel in assignment 3) is often enough for all the communication needed between threads. We can certainly have memory-safe threads if we have no shared memory.

More Concurrency Tools

But what if we need to share some large immutable data structure? Or a shared mutable data structure?

The answer is going to have to wait.

[But short answer: Arc and Mutex.]

More Concurrency Tools

There are also some traits important to multi-threaded code that it might be important to note on values you're hoping to share around threads.

The Send trait marks a value that Rust can transfer ownership to another thread. Most types are Send.

More Concurrency Tools

Since this code worked,

for i in 0..3 {
    let snd = sender.clone();
    thread::spawn(move || {
        snd.send(i * 10).unwrap();
    });
}

… we infer that i32 and Sender (the types of i and snd) must be Send.

More Concurrency Tools

The other trait we might need to know about is Sync which indicates a type where it's safe to share references between multiple threads. Note: they must be immutable references because we already know multiple mutable references can't exist.

More Concurrency Tools

Safely implementing Sync is harder, but still there for many types.

Some types allow interior mutability where you can have an immutable reference, but do mutable things to contents. Those either don't implement Sync or lock/unlock as necessary.

More Concurrency Tools

Concurrent (but not parallel) code can be created with asynchronous programming (but I won't say more here).

There are also many third-party crates that help with parallel code.

Rayon: parallel versions of iterator operations.
Crossbeam: more tools for concurrent programming.
Tokio: reliable async tools.
Threadpool.