Languages & Concurrency

Concurrent Programming

A concurrent program is one where several parts of the program are running together (possibly taking turns, or possibly on different cores).

A parallel program is a concurrent program where different processes or (kernel) threads allow different parts of the code to run at the same time on multiple cores.

e.g. generators in Python and lazy evaluation in Haskell make our functions concurrent but not parallel.

Concurrent Programming

In the modern world, almost every computing device you'll encounter has multiple cores. If your code isn't parallel, it isn't fast.

Let's hope our programming languages can help us write concurrent code, and then get that concurrent code running in parallel.

Concurrent Programming

In Haskell, we saw par, pseq, and parMap to help us get things running concurrently. The compiler and runtime environment let that code run in parallel using multiple threads.

The pure functional style made concurrency easy, but lazy evaluation took it away unless we did some work.

See the concurrent Fibonacci example.

Threads & Processes

Process: an instance of a running program.

A single process can have several points where it is executing concurrently: each is a thread.

Threads & Processes

Multiple processes are scheduled to run in parallel by the operating system. If we want them to work together, they need to communicate somehow: sockets, pipes, shared files, shared memory, etc.

The difficulty: inter-process communication is essentially as hard as network programming. Processes might stop communicating or fail completely. You need to handle that.

Threads & Processes

Multiple threads share their process' memory. If one thread modifies the data while another is reading it, bad things are going to happen.

Code (for a module or data structure) is thread safe if it can be used by multiple concurrent threads without anything going wrong. Most languages provide some thread-safe data structures, but you must be very careful to identify them.

e.g. Some Java collections are thread-safe; the Python queue module provides a thread-safe queue.

Threads & Processes

Thread safety is hard in general. See CMPT 300.

e.g. the Python and Ruby interpreters (standard implementations) are not thread safe, so have a global interpreter lock to let only one Python/​Ruby thread run at a time (per process).

Threads & Processes

But if you can avoid sharing any data structures, thread safety is easy.

Pure functions come up again: the function can be called many times in parallel safely because they never share anything.

Threads & Processes

Or sharing immutable data structures should be safe.

Maybe the immutability is enforced by the language, or maybe it's a class designed to be immutable (or immutable in certain circumstances), or maybe you just design your code to not to mutate the data while multiple threads are using it.

e.g. in Rust, when data is shared between threads it is automatically immutable.

Threads & Processes

If you must share and mutate data structures, you need to do it safely by locking/​communicating with mutexes or semaphores.

Like manual memory management: it's error-prone hard to get right 100% of the time. My advice: avoid if possible.

Kernel/User Threads

There are two flavours of threads that a programming language might make available to you…

A kernel thread (or OS-level thread) is created and scheduled by the operating system. Each kernel thread has its own stack, program counter, and registers. Each process has one or more kernel threads that share its memory.

Kernel/User Threads

A user thread (or sometimes green thread) are managed by the language's runtime environment (VM or similar). The environment can let different concurrent parts of the code take turns executing, but not run in parallel.

Kernel/User Threads

User threads are much faster to create and manage: you can make thousands of user threads in most environments, but thousands of simultaneous kernel threads might be a problem.

But user threads don't take advantage of multiple cores. Some hybrid of the two might be most efficient, like hybrid (M:N) threading.

Thread Communication

We want threads to avoid sharing (mutable) memory. But, we still need threads to communicate with each other to get work done.

One possible answer will be to let them communicate by sending messages on channels. We'll see many examples in Go, so let's wait.

Language Concurrency

Newer languages have incorporated more safe concurrency features into their design.

Rust: explicit ownership of references makes formal who has access to objects. References can either be shared and immutable, or mutable and unique.

Language Concurrency

Go: Threads should coordinate by communicating on thread-safe channels. Don't communicate by sharing memory, share memory by communicating. *

Erlang: an Erlang processes is a user thread that cannot share memory with others. All communication by messages.

Concurrency Tools

There are higher-level tools a language can provide to make concurrency easier.

In Haskell, parMap let us work almost like with map but in parallel, which was nice.

Concurrency Tools

One possibility: thread pools, where a collection of threads (or processes in some implementations) is created, along with a work queue that feeds work into them.

This prevents any overhead of frequently creating/​destroying threads, and give the programmer an easy API to work with.

Concurrency Tools

For example, if we have a Scala class (or several different classes) that implements Runnable (like Java)…

class SomeWork(n: Int) extends Runnable { // ... }

Then we can have a pool of threads that queue whatever .run() methods we want executed: [full code]

val pool = Executors.newFixedThreadPool(8)
1 to 100 foreach { x =>
  pool.execute(new SomeWork(x))
}
pool.shutdown()

Concurrency Tools

In Python, the multiprocessing module contains process pools. They contain higher-level functions to do work in parallel:

pool = multiprocessing.Pool()
values = range(100)
result = pool.map(long_computation, values)
print(result)

Example: Parallel Computation with Pools.