Languages & Concurrency

Concurrent Programming

A concurrent program is one where several parts of the program are running together (possibly taking turns, or possibly on different cores).

A parallel program is a concurrent program where different processes or (kernel) threads allow different parts of the code to run at the same time on multiple cores.

e.g. generators in Python and lazy evaluation in Haskell make our functions concurrent but not parallel.

Concurrent Programming

In the modern world, almost every computing device you'll encounter has multiple cores. If your code isn't parallel, it isn't fast.

Let's hope our programming languages can help us write concurrent code, and then get that concurrent code running in parallel.

Concurrent Programming

In Haskell, we saw par, pseq, and parMap to help us get things running concurrently. The compiler and runtime environment let that code run in parallel using multiple threads.

The pure functional style made concurrency easy, but lazy evaluation took it away unless we did some work.

See the concurrent Fibonacci example.

Threads & Processes

Process: an instance of a running program.

A single process can have several points where it is executing concurrently: each is a thread.

Threads & Processes

Multiple processes are scheduled to run in parallel by the operating system. If we want them to work together, they need to communicate somehow: sockets, pipes, shared files, shared memory, etc.

The difficulty: inter-process communication is essentially as hard as network programming. Processes might stop communicating or fail completely. You need to handle that.

Threads & Processes

Multiple threads share their process' memory. If one thread modifies the data while another is reading it, bad things are going to happen.

Code (for a module or data structure) is thread safe if it can be used by multiple concurrent threads without anything going wrong. Most languages provide some thread-safe data structures, but you must be very careful to identify them when needed.

e.g. Some Java collections are thread-safe; the Python queue module provides a thread-safe queue.

Threads & Processes

Thread safety is hard in general. See CMPT 201 or 300.

e.g. the Python and Ruby interpreters (standard implementations) are not thread safe, so have a global interpreter lock to let only one Python/Ruby thread run at a time (per process).

Threads & Processes

But if you can avoid sharing any data structures, thread safety is easy.

Pure functions come up again: the function can be called many times in parallel safely because they never share anything.

Threads & Processes

Or sharing immutable data structures should be safe.

Maybe the immutability is enforced by the language, or maybe it's a class designed to be immutable (or immutable in certain circumstances), or maybe you just design your code to not to mutate the data while multiple threads are using it.

e.g. in Rust, when data is shared between threads it is automatically immutable.

Threads & Processes

If you must share and mutate data structures, you need to do it safely by locking/communicating with mutexes or semaphores.

Like manual memory management: it's error-prone hard to get right 100% of the time. My advice: avoid if possible.

Kernel/User Threads

There are two flavours of threads that a programming language might make available to you…

A kernel thread (or OS-level thread) is created and scheduled by the operating system. Each kernel thread has its own stack, program counter, and registers. Each process has one or more kernel threads that share its memory.

Kernel/User Threads

A user thread (or sometimes green thread) are managed by the language's runtime environment (VM or similar). The environment can let different concurrent parts of the code take turns executing, but not run in parallel.

Kernel/User Threads

User threads are much faster to create and manage: you can make thousands of user threads in most environments, but thousands of simultaneous kernel threads might be a problem.

But user threads don't take advantage of multiple cores. Some hybrid of the two might be most efficient, like hybrid (M:N) threading.

Thread Communication

We want threads to avoid sharing (mutable) memory. But, we still need threads to communicate with each other to get work done.

One possible answer will be to let them communicate by sending messages on channels (thread-safe message queues).

Language Concurrency

Newer languages have incorporated more safe concurrency features into their design.

Rust: explicit ownership of references makes formal who has access to objects. References can either be shared and immutable, or mutable and unique.

Language Concurrency

Go: Threads should coordinate by communicating on thread-safe channels. Don't communicate by sharing memory, share memory by communicating. *

Erlang: an Erlang processes is a user thread that cannot share memory with others. All communication by messages.

Async Programming

Many languages have started to support concurrency with asynchronous programming.

The idea: calling an async function doesn't return the function result, it starts the function and returns some kind of object representing the in-progress calculation. Your code can get on with other stuff and await the result when it's actually needed.

Async Programming

A quick example in Python:

async def test1():
    print('one')
    s = asyncio.sleep(1)
    print(s)
    print('two')
    await s
    print('three')

asyncio.run(test1())

Note from non-async code, you can't just call an async function: you have to run it with an async tool.

Async Programming

The output, with a pause between two and three:

one
<coroutine object sleep at 0x7f1fafb0f450>
two
three

[Do async tasks start before the await or not? Probably depends on the tool, but I should have checked more closely.]

Async Programming

Or a slightly more realistic example where we start an async task before we need its result, then do some other stuff while it works, then await the result.

async def test2():
    task = calculate_things(10)
    print(f'task == {task}')
    # Do other stuff while calculate_things is working.
    # Then...
    res = await task
    print(f'res == {res}')

task == <coroutine object calculate_things at 0x7f1fafb0f450>
res == 20

Async Programming

Async gives us a relatively easy way to get things happening concurrently.

Is the async task done is a user thread or kernel thread or something else? It depends on the language. Async is more commonly used for the get the slow I/O tasks started case than use all my CPU cores to compute, so that's likely how the tools are optimized.

It might be worth looking at a tutorial in your favourite language: it seems like an increasingly popular technique.

Async Programming

Concurrency Tools

There are higher-level tools a language can provide to make concurrency easier.

In Haskell, parMap let us work almost like with map but in parallel, which was nice.

Concurrency Tools

One possibility: thread pools, where a collection of threads (or processes in some implementations) is created, along with a work queue that feeds work into them.

This prevents any overhead of frequently creating/destroying threads, and give the programmer an easy API to work with.

Concurrency Tools

For example, if we have a Scala class (or several different classes) that implements Runnable (like Java)…

class SomeWork(n: Int) extends Runnable { // ... }

Then we can have a pool of threads that queue whatever .run() methods we want executed: [full code]

val pool = Executors.newFixedThreadPool(8)
1 to 100 foreach { x =>
  pool.execute(new SomeWork(x))
}
pool.shutdown()

Concurrency Tools

In Python, the multiprocessing module contains process pools. They contain higher-level functions to do work in parallel:

pool = multiprocessing.Pool()
values = range(100)
result = pool.map(long_computation, values)
print(result)

Example: Parallel Computation with Pools.

Concurrency Tools

Go makes it extremely easy to do a function call in a separate thread (goroutine) with the go keyword.

func Count(ident string, n int) {
	for i := 1; i <= n; i++ {
		fmt.Printf("%s%d ", ident, i)
		time.Sleep(50 * time.Microsecond)
	}
}

func testGoroutines() {
	go Count("A", 5)
	go Count("B", 4)
}

Concurrency Tools

Go uses hybrid (M:N) threading, so creating a goroutine is fairly inexpensive. If you have work where creating a user-space thread is reasonable overhead, doing everything in parallel is very easy.

for request := HttpListen(80) {
    go send_response(request)
}

Concurrency Tools

To achieve the threads should coordinate by communicating, Go provides a thread-safe channel that can be used to coordinate

func addBytesChannel(a, b byte, result chan byte) {
	result <- a + b
}

func ChannelExample() {
	result := make(chan byte)
	go addBytesChannel(3, 4, result)
	res := <-result
	fmt.Println(res)
}

Concurrency Tools

The lesson: concurrency can be hard, but it doesn't have to be.

Look for help from your language/libraries. Look for the easy cases where you can work concurrently without having to share resources.