Languages: Types & Memory

Static/Dynamic Typing

The question here: does the compiler know a value's type when it compiles the program?

For statically typed languages, the answer is yes: C, Java, C#, Dart, etc.

Where the types aren't known until runtime, the language is dynamically typed: Python, JavaScript, Lisp, Ruby, etc.

Static/Dynamic Typing

There are some where it's not clear: Haskell, Scala. Consider this Haskell code:

calculate :: Num a => a -> a -> a
calculate x y = x*y + x

Are the * and + doing arithmetic on integers or floating point or complex numbers or …? It depends:

*Main> let a = 5   :: Int
*Main> let b = 7.0 :: Double
*Main> :t calculate a 5
calculate a 5 :: Int
*Main> :t calculate b 5
calculate b 5 :: Double

Static/Dynamic Typing

Static typing allows more to be checked at compile time.

e.g. Is the expression a/b okay? Yes if a and b are floating point; no if they are strings.

This allows the compiler to catch more programmer errors. In a statically typed language, you can have more confidence that the types are right.

Static/Dynamic Typing

Dynamic typing allows more flexibility and often less code.

The programmer doesn't have to explicitly declare/​allocate/​type variables, which saves keystrokes/​effort.

Static/Dynamic Typing

Polymorphism is easy in a dynamically typed language. e.g. this code will work on any three values where + can work: integers, floats, strings, etc.

def add_three(a, b, c):
    return a + b + c

In C, Java, Fortran, that would have to be different functions for each input/​output type.*

Static/Dynamic Typing

Remember: there are some related things that always have to be checked at runtime: array bounds, division by zero, etc.

Static typing can catch more things at compile-time, but not every type-related error.

Static/Dynamic Binding

Of course, this is related to static/​dynamic typing, but here the question is: when do we know which operator/function/definition applies?

Statically bound: the compiler knows exactly which operation is going to happen.

Dynamically bound: the details have to be determined at run-time.

Static/Dynamic Binding

Consider this Java code:

String s;
double d;
⋮
System.out.print(s);
System.out.print(d);

There are separate implementations of print for different types, but the compiler knows which one to choose. i.e. the calls are bound to a particular implementation at compile time.

Static/Dynamic Binding

Consider similar Python:

a = 6
b = 7
⋮
c = a + b

The compiler doesn't really know which + that is: maybe integer addition, but maybe the types change and it's string concatenation.

The types must be checked and operator bound at runtime: dynamic binding.

Static/Dynamic Binding

Dynamic binding provides greater flexibility: the same code can work on integers, and floats, and any other type where the function/​operators used are defined

But it comes with a speed penalty: types must be checked every time a line of code is executed.

Static/Dynamic Binding

Static binding avoids this run-time overhead since the compiler can make the right bindings before execution even starts.

The speed difference can be huge: the difference between (1) executing the processor's ADD instruction and (2) a type check, symbol table lookup, method call, then do the ADD.

Static/Dynamic Binding

Let's look back at the Mandelbrot benchmark.

The speed difference between Cython's time on dynamically-typed .py code and (partially-)statically-typed .pyx code is 37 times.

It wasn't compiling to machine code that made the code faster (Cython compiled both through C), but removing the overhead from dynamic binding (in this case at least).

Static/Dynamic Binding

The static/​dynamic distinction can be seen in Cython's output: it compiles Python + optional type declarations to C (and then machine code). It will produce annotated input/​output with Python and corresponding C. Notes:

  • escapes is statically-typed: the + operator turns into a C + operator, and can be statically bound to an operation on double.
  • mandel is dynamically-typed: + turns into a PyNumber_Add call that (presumably) does dynamic type checks and is therefore slower.

Static/Dynamic Binding

Also note: in the Cython .pyx code, only the escapes function that takes most of the time got the static type declarations. The mandel and toChar function didn't.

The running time is essentially identical to C, Go, Rust. Optimizing is often a waste of time, but optimizing the inner-most loop might be worth it. Profile before you optimize.

Static/Dynamic Binding

Dynamic binding is another place where JITs can help.

If you always call a function with arguments of a specific type, then the JIT can notice and compile a statically-bound version of the function and use it with logic like:

if types match previous calls:
    call statically-typed machine code
else:
    interpret dynamically-typed bytecode,
    or maybe compile a version for the new types.

Maybe you get a statically-typed implementation of your dynamically-typed code.

Static/Dynamic Binding

The Mandelbrot benchmark again: PyPy takes the dynamically-typed Python code and gets performance close to statically-typed Cython (and better than statically-typed Java).

Conclusion without proof: it must be compiling statically-bound versions of the code to get that performance.

Static/Dynamic Binding

Dynamic binding can also happen in a statically-typed language.

Some statically-typed languages have features that require a type-related decision at run-time.

Static/Dynamic Binding

Consider this Java:

class Dog Extends Animal {…}
⋮
void someMethod(Animal a) {
    a.reward();
}

The .reward() call might be to Animal.reward() or Dog.reward(), depending on the subclass passed. The decision on that line of code has to happen at run-time.

In analogous C++ code, the .reward() always refers to Animal.reward() because that's the declared type: it insists on statically binding.

Type Inference

We saw type inference in Haskell: it could determine the types of values by examining known types of return values and literals. e.g.

res = (length some_list) + 1

The known types imply that res is an Int:

  • length xs :: Int
  • 1 :: Num a => a
  • (+) :: Num a => a -> a -> a

Type Inference

In general, type inference or implicit typing is done in several languages. The compiler must know the types of some values in an expression, or return values of functions.

If the compiler can infer types, it can statically bind in its output.

Type Inference

It seems like programmers really want least some type inference. It was in C# from the start. It was added in C++11 (and enhanced in C++17):

double add_one(double x) {
    auto y = x + 1;  // y inferred to be type double
    return y;
}

And to Java in version 10, where variable types can be inferred from the initializer:

var x = 1.0;  // type inferred as double from literal
var values = new ArrayList<Double>();
    // type inferred as ArrayList<Double> from constructor

Type Inference

Analogous Rust code can do the same type inference (on variable y).

fn add_one(x: f64) -> f64 {
    let y = x + 1.0;
    return y
}

Rust can do more advanced type inference, feeling a little like Haskell: example from Rust docs

pub fn main() {
    let elem = 5u8;           // typed literal implies type
    let mut vec = Vec::new(); // vector of type ???
    vec.push(elem);           // oh, vector contains type u8
    println!("{:?}", vec);
}

Type Inference

All of these languages can do static binding at compile-time: they know the types even if you didn't tell them.

That way the programmer doesn't have to specify types everywhere, which is repetitive and boring.

Type Inference

Python 3 generally does not do type inference (or any static typing), but the programmer can give type hints about what kind of values will be used:

def double(x: float) -> float:
    return x+x

print(double(4))
print(double("abc"))

i.e. the argument and return type of f are both float values: the first call should be fine, but the second has the wrong types. Strangely, it still runs:

8
abcabc

Type Inference

There is a separate tool Mypy that checks the types. The command mypy mypy-demo.py would produce:

mypy-demo.py:5: error: Argument 1 to "double" has incompatible
type "str"; expected "float"

Type checking and static binding are not done when compiling. They are only used for this static check, and as hints for IDEs.

Type Inference

Mypy does type inference (and warns if there's a place in the code where it can't, and needs hints):

from typing import Dict
def to_dict(x: int, y: str) -> Dict[int, float]:
    x1 = x+1
    return {x1: y}
mypy-demo.py:10: error: Dict entry 0 has incompatible type
"int": "str"; expected "int": "float"

[i.e. you promised to return a dictionary that maps int to float, but actually have int to str.]

Duck Typing

Dynamic typing (and binding) is generally slower, but more flexible.

The usual style is to not check types, just use the properties/​methods/​operators that you expect to be there: treat the value like the kind of object you expect.

Duck typing: If it looks like a duck and quacks like a duck, then it must be a duck.

Duck Typing

e.g. you may write a function to count how many times a particular integer occurs in a list:

def count_occurrences(lst, val):
    count = 0
    for v in lst:
        if v == val:
            count += 1
    return count

But this will actually work with any other object that can be iterated (with for) and has values that can be checked for equality (with ==). Includes characters in a string, lines in a file, or some other custom-built collection.

Duck Typing

Notable duck-typed languages: Python, Ruby, Perl, C# (sometimes, with dynamic and member lookup).

Contrast: Java interfaces, Go interfaces, C# interfaces, Haskell type classes, C++ templates. These provide similar flexibility, but with explicit types.

Duck Typing

In Python, it's considered bad style to explicitly check a type: without the check, this code would have worked on any iterable collection.

def count_occurrences_ugly(lst, val):
    assert isinstance(lst, list)
    count = 0
    for v in lst:
        if v == val:
            count += 1
    return count

Duck Typing

It is very easy to be too restrictive with the Python hints:

from typing import List
def count_too_specific(lst: List[int], val: int) -> int:
    # ⋮

Better: express clearly what the code can do:

from typing import Iterable, TypeVar
T = TypeVar('T')
def count_well_hinted(lst: Iterable[T], val: T) -> int:
    # ⋮

Maybe this is manual duck typing? You express the types that are good enough. Or maybe it's like Haskell's type classes + type variables.

Duck Typing

Duck typing often goes with dynamic typing/​binding.

Which you like is largely a matter of taste.

Type Checking

To what extent does the language prevent you from making type errors? Type error: treating a value as an incompatible type.

e.g. calling obj.someMethod() when that method doesn't exist.

e.g. applying the / operator to a string.

Type Checking

All languages have some type checking, but can we avoid/fool these checks?

e.g. Java checks most things at compile time, but this compiles (but warns and throws a runtime exception):

List values = new ArrayList();
Integer i = 6;
String s;
values.add((Object) i);
s = (String) values.get(0);

[That is considered very bad Java style. Should have been an ArrayList<Integer> which can be checked.]

Type Checking

This is seen as a feature in C, but can be dangerous:

unsigned int vals[2] = {0, 1081602048};
double *dp = (double *) vals;
printf("%f\n", *dp);

Output:

383.000000

Type Checking

The terms strongly typed and weakly typed are used to describe more or less type safety, but seem to be used differently by everybody. Rough definitions:

Strongly typed: every value has a single well-defined type, so you can do lots of checking.

Weakly typed: single values can be treated as different types, so type checking is harder.

Type Checking

C and Java are mostly strongly typed since values are explicitly statically typed, but pointer/​reference casting allows the programmer to do weakly-typed things.

Python is strongly typed since the type of each value is tracked by the language (but at runtime since it's dynamically typed).

Type Checking

Some languages (Perl, PHP, JavaScript) will implicitly convert values when necessary (type coercion), so they feel more weakly typed. All display 6:

print "12"/2
print("12"/2);
console.log("12"/2)

This can be very confusing for programmers.

Mutable/Immutable Data

A mutable object/​value is one that can be modified after its initial creation. Most OO languages have mutable objects: internal state can be modified by methods or by assigning public properties. e.g. in Java:

var list = new ArrayList<Integer>();   // empty list
list.add(6);                           // now length 1

Or most objects in Python objects can have class attributes modified:

pt = Point2D(3, 5)   # 2D point (3,5)
pt.x = 4             # now represents (4,5)

Mutable/Immutable Data

Immutable: objects/​values that aren't mutable.

e.g. Integer objects in Java or C#; strings in Java, C#, Python; tuples in Python.

Mutable/Immutable Data

A variable containing an immutable object can still be changed by assigning a new instance:

string direction = "up";
direction = "down";

Important distinction: the variable refers to a new object, not the same object with different contents. Any other references (aliases) to the old object haven't changed.

Immutable objects are useful when you need to know an object won't change.

Mutable/Immutable Data

Examples where immutability might be important:

  • In a dictionary/​map/​hash table key: the data structure needs to be able to check the value when it's inserted and know it doesn't change while in the collection.
  • As an argument: a mutable object could be modified whenever it is given as an argument to a function. Immutability ensures it won't be.
  • Sharing between threads: safe for multiple threads to access it if it can't be changed.

Mutable/Immutable Data

But mutable objects are often much easier to work with when data structures do need to change. Some examples:

  • Collections: don't want to rebuild a list of a million elements just to add one more.
  • State: state of a game (or whatever) changes frequently and could be shared by many parts of the code.

Mutable/Immutable Data

Some languages have ways to enforce mutability.

  • private fields and careful class design.
  • C++ can mark and enforce mutability restrictions with const variables, functions, and arguments.
  • In C#, readonly fields are immutable.
  • Haskell had neither mutability nor the ability to re-assign: that made any state difficult.

Mutable/Immutable Data

Even if your language doesn't have any mutability guarantees, it's still something that should be documented.

from typing import List

def append1(lst: List[int], elt: int) -> None:
    '''
    Append elt to the lst. Modifies lst in-place.
    '''
    lst.append(elt)

def append2(lst: List[int], elt: int) -> List[int]:
    '''
    Return a copy of lst with elt appended to it.
    '''
    return lst + [elt]

Mutable/Immutable Data

In Rust, the concept of mutability is front-and-centre. Variables are declared as mutable or not:

let mut count = 0;
count += 1;
println!("count = {}", count);

Without the mut keyword here, compilation fails on the second line: cannot assign twice to immutable variable.

Mutable/Immutable Data

There can be no question in Rust if a function can mutate a value since you pass a mutable reference. If a variable is declared as mutable but doesn't actually mutate, there's a compiler warning.

Then the Rust compiler can ensure that threads don't share mutable references: only one piece of code can own a value.

Mutable/Immutable Data

Same as the Python example: in-place mutation vs copy-and-modify:

fn append1(vec: &mut Vec<i64>, elt: i64) {
    vec.push(elt);
}
fn append2(vec: &Vec<i64>, elt: i64) -> Vec<i64> {
    let mut result = vec.to_vec();
    result.push(elt);
    return result;
}
let mut vec: Vec<i64> = (0..5).collect();
append1(&mut vec, 5);
let longer = append2(&vec, 6);

Memory Management

The data your program is using is stored in the computer's memory. Each program has a certain amount of memory allocated to it, which can be expanded or contracted as necessary.

Memory Management

This memory is generally divided into three parts:

  • Static: memory known to be needed at compile time. e.g. strings in in the program, globals, C static.
  • Stack: function parameters and function-local variables (or similar for other variables scopes).
  • Heap: dynamically-allocated objects. e.g. malloc in C; new in C++, Java, C#; all objects in Python, Ruby.

Memory Management

The usual diagram of what's conceptually happening:

memory usage

Stack and heap grow and shrink as necessary.

Memory Management

The stack contains info on function calls: arguments and locally-defined variables for each instance. Values are popped when the function returns.

In dynamic languages, usually everything is on the heap (but there is a call stack for functions).

Managing Memory

As a programmer, how do you control the memory you're using? What is stored and when can it be thrown away? The goal: make sure we only store the data we need, not keep around old stuff: a memory leak.

How we do that depends on the language.

Managing Memory

Stack memory is fairly obvious: when the function returns, pop its stuff off the stack. Every language (except maybe assembly) takes care of this.

But the stack is often only a small amount of the memory being used: in most OO languages, the stack variables are just object references(/pointers) and actual object contents are on the heap.

Managing Memory

Heap memory is harder, because it's hard to know when a program is no longer using a value.

Values are still useful as long as there is a pointer/​reference to them. How do we know when the last reference is gone?

Manual Memory Mgmt.

In C, keeping track of allocated heap memory is the programmer's problem. What is allocated must be freed by somebody, exactly once.

/* array of 100 int on the heap: */
int *arr = (int *)malloc(100*sizeof(int));
arr[17] = 10;
printf("%i\n", arr[17]);
free(arr);   /* free() must be called to not leak. */

Manual Memory Mgmt.

In C++, objects can be on the stack: those are destroyed when the function returns.

void stack_object_example() {
    Point2D pt1 = Point2D(1, 2);
    cout << pt1 << '\n';
}

Objects created with new are on the heap and must be deleted.

void heap_object_example() {
    Point2D *pt2 = new Point2D(3, 4);
    cout << *pt2 << '\n';
    delete(pt2);
}

Manual Memory Mgmt.

When pointers are passed around, it can be unclear who owns them and is responsible for deleting.

Point2D *create_object_pointer() {
    Point2D *pt = new Point2D(5, 6);
    return pt;
}

Some code far away must delete().

Point2D *pt3 = create_object_pointer();
cout << *pt3 << '\n';
delete(pt3);

Manual Memory Mgmt.

That's extremely error-prone. In C++11, memory management was modernized with smart pointers like unique_ptr<T> that is a pointer available in exactly one scope.

unique_ptr<Point2D> create_unique_ptr() {
    auto pt = make_unique<Point2D>(7, 8);
    return pt; // implicitly give up ownership by returning
}

Manual Memory Mgmt.

When the unique_ptr is destroyed, it deletes its contents.

auto pt4 = create_unique_ptr();
cout << *pt4 << '\n';

No need to delete(pt4): it happens automatically. Ownership can be explicitly given away (moved) if other code needs the object.

Manual Memory Mgmt.

auto pt5 = create_unique_ptr();
function_that_uses_a_point2d(move(pt5));
// cout << *pt5 << '\n';

Calling move() gives away ownership of the pointer, giving ownership to the function (so it's deleted when the function returns).

After that, using *pt5 (outside the function) would fail: this code gave away ownership of the object so it's not ours to work with.

Manual Memory Mgmt.

The result: very little work on the programmer's part, but it's very hard to have a memory leak.

You really should be using smart pointers in modern C++.

See also shared_ptr that has reference counting semantics (more later).

Also compare Rust's ownership of values.

Manual Memory Mgmt.

C and C++ (and old Objective-C) are the only modern languages where memory is managed manually, and the trend is definitely away from doing so.

It's just too hard to free/​delete perfectly 100% of the time. If you don't, your program will slowly use more-and-more memory over time.

What are the alternatives?

Garbage Collection

It would be nice if the language would handle the freeing of memory for us.

Basic observation: if there are no references left to an object, it can be freed.

But, having references doesn't mean an object will actually be used again. The programmer should still explicitly delete (release, del, etc) when necessary, especially large objects with long-lived references.

Garbage Collection

A garbage collector is part of a language's runtime environment that looks for objects on the heap that can no longer be accessed (garbage) an frees them (after calling their finalizers, in languages that have the concept).

Garbage Collection

Garbage collection…

  • avoids memory leaks (but doesn't completely prevent them if the programmer keeps useless references around);
  • eliminates work the programmer has to do to manage memory;
  • happens at run-time, so causes some overhead.

Garbage Collection

There are several garbage collection algorithms.

The programmer needs to know that the language has garbage collection. An implementation of the language might choose any algorithm; different implementations of the same language might have different strategies.

Garbage Collection

Reference counting garbage collection keeps track of the number of references to each object. When the number decreases to zero, delete. Can't handle cyclic data structures; requires space and time to maintain the counters.

Tracing garbage collection looks for which objects are reachable from references available in the program: everything else is garbage. There are many strategies to do this quickly and without stopping execution while it happens.

Tracking Ownership

If using only C++ smart unique_ptrs (and friends), there isn't any need for garbage collection at runtime.

Assuming we keep the unique_ptr on the stack, it will be deleted as appropriate. When the unique_ptr is deleted, it will automatically delete the object it refers to: the pointer was unique, so reference counting is easy.

Tracking Ownership

Rust does the same and ensures memory safety by having explicit ownership of memory.

There is no garbage collector, but the compiler can determine when a value is no longer needed (when there are no more references to it) and free the heap memory.

Tracking Ownership

Every language besides C, C++, and old Objective-C (that I know of) has some kind of automatic memory management.

The two options that seem to be used: garbage collection that happens at runtime; tracking of object ownership at compile-time.

First-Class Functions

A language has first-class functions if a function can be treated like any other value.

For example, can be passed as an argument, stored in a list, created at runtime, returned from a function.

First-Class Functions

We have seen this in Haskell where functions can do anything any other values can do.

result = foldl operation 0 [1,2,3,4,5]
    where operation a x = 2*a + x

Here, foldl is getting three arguments: a function, an integer, and a list. The function is just as good as any other argument.

First-Class Functions

First-class functions are often useful: most newer (post-1990s) languages have them. It's sometimes useful to treat functions as values. In Python:

all_sorts = [quicksort, mergesort, heapsort]
for sort in all_sorts:
    result = sort(array)
    assert is_sorted(result)

First-Class Functions

An implementation of filter and an anonymouns (lambda) function in JavaScript (like Array.filter):

function filter(predicate, array) {
    var result = []
    for (let x of array) {
        if ( predicate(x) ) {
            result.push(x)
        }
    }
    return result
}
res = filter((x) => { return x%2==0 }, [1,2,3,4,5,6])

… with some bonus ECMAScript 6 syntax and duck typing.

First-Class Functions

In Ruby, building a lambda function at runtime; a closure (more later); passing a function as an argument to .select().

def div_by(n)
    return lambda {|m| m%n == 0}
end
print [1,2,3,4,5,6].select(&div_by(2))

First-Class Functions

Most languages do some of the first-class function things.

C can pass and store references to functions: you can work with function pointers like any other pointers. It's not possible to create a function at runtime or partially apply functions.

C++ added lambda expressions in C++11.

First-Class Functions

Java has always had the Runnable interface: if you need to pass/​return a function, you can actually use an instance of Runnable and .run() gets called as the function (with no arguments).

Java added lambda expressions in Java 8. They create objects that are instances of Runnable (or similar).

Runnable r = () -> System.out.println("Hello world");
r.run();                             // prints "Hello world"
Predicate<Integer> isEven = n -> n % 2 == 0;
System.out.println(isEven.test(3));  // prints "false"

First-Class Functions

Lambda expressions are a sign that a language has a lot of first-class function features: you can create a function when you need to and call/​store it. Also called anonymous functions.

Anonymous functions are very common in JavaScript:

res = filter((x) => { return x%2==0 }, [1,2,3,4,5,6])
res = filter(function(x) { return x%2==0 }, [1,2,3,4,5,6])

The first argument to filter is an anonymous function that takes one argument, x (in new and old syntax).

First-Class Functions

In Python, lambda expressions can define a function, but only out of one expression (not multiple statements).

is_even = lambda m: m%2 == 0
print(is_even(3))              # prints "False"

But named functions can be defined in other functions.

def demo_nested_function():
    def is_even(m):
        remainder = m%2
        return remainder == 0
    print(is_even(3))          # prints "False"

Closures

Why would you want to define a function inside another function? Probably to create a closure

Closures

Consider this function that builds a function:

function divisibility_checker(n) {
    var div_check = function(m) { return m%n == 0 }
    return div_check
}

It can then be used like:

var is_even = divisibility_checker(2)
evens = filter(is_even, [10,11,12,13,14])
console.log(evens)   // outputs [10, 12, 14]

How did the argument n survive until is_even was called? It should be out of scope by then.

Closures

A closure is a function (or class or other structure) with free variables bound to the scope where it was created.

Reminder: free variable, a variable that isn't a function argument or local variable, but comes from outside.

A closure lets you construct a function/​class that uses a value that you can't pass in as an argument.

Closures

In Haskell, we can do a closure in a function defined in a let/​where. e.g. one of my implementations of join uses a closure over sep in the where:

join _ [] = []
join sep strs = foldl1 withSep strs
    where withSep acc s = acc ++ sep ++ s

Closures

In C++, the variables for the closure are given explicitly (in the lambda expression's []):

#include <functional>
std::function<bool (int)> divisibility_checker(int n) {
    auto div_check = [n](int m) { return m%n == 0; };
    return div_check;
}

This can be used similar to the previous Python example:

auto is_even = divisibility_checker(2);
cout << is_even(7) << '\n';

Closures

Closures can also be used in many languages to create a class. e.g. in Python to create a list subclass that can't be popped below a certain size:

def MinSizeList(n):
    class MSL(list):    # the list class, but...
        def pop(self):  # with .pop() overridden
            if len(self) <= n:
                raise IndexError("Min size is %i" % (n,))
            return super().pop()
    return MSL          # return the new class

Closures

Then this can be used like the list class, but with an override:

List2 = MinSizeList(2)     # dynamically-generated class
l = List2([0, 10, 20, 30])
print(l.pop())
print(l.pop())
print(l.pop())
30
20
IndexError: Min size is 2

This works even if you're passing the object to other code that you don't control and .pop() is called.

Callbacks

When functions are passable as arguments, they can be used to control program flow: a function can be called to process results.

The most obvious use: a function that can handle work some time in the future, continuation-passing style.

Callbacks

e.g. a Ruby Rack web server. Whenever a request comes in, the callback function (technically Ruby Proc object) is called.

require 'rack'

req_handler = Proc.new do |env|
    ['200',
        {'Content-Type' => 'text/plain; charset=utf-8'},
        ['Hello world!\n']
    ]
end

Rack::Handler::WEBrick.run req_handler

Callbacks

Callbacks give a way for library code to call your code. Callback-heavy code starts to feel like it's event-driven.

JavaScript is probably the place you see callbacks most often. They allow other logic to run while waiting for a slow I/O operation, in a single thread (non-blocking I/O).

Callbacks

e.g. request data from the server, and provide an anonymous callback function to handle the response when it's available (with jQuery):

$.ajax({
    url: '/data/url',
    dataType: 'json',
    success: function(data) {
        console.log(data);
    }
});

Callbacks

Callbacks are often a useful place for a closure: when you need some data in the callback that isn't passed as an argument when it's called, a closure might be the only way to get it there.

function setup_clickables(id) {
    $('.clickable').click(function () {
        doSomething(id);
    });
}

Coroutines

The concept of lazy evaluation was (sometimes) useful in Haskell. It would be nice to have a similar concept in strictly-evaluated languages.

Coroutines are function that can suspend their execution and work together with other coroutines to get work done.

Coroutines

Because of the lazy evaluation in Haskell, something like this was actually happening all the time.

take 3 (iterate hailstone 31)

If we evaluate take, it has to get the first element from its last argument; take suspends and iterate starts; it suspends and hailstone starts; hailstone actually returns and iterate resumes and produces the first list item needed by take; then take resumes; … .

Coroutines

So, take, iterate, and hailstone were all operating cooperatively. They aren't called coroutines because the language is making the decisions, not the programmer.

In some languages, there are constructs where functions can suspend/​resume when they have or need results from a cooperating coroutine.

Coroutines

Any Python function that uses yield immediately returns a generator object, which the calling code can then iterate through; the function will resume to create each value.

def squares(n: int) -> Iterable[int]:
    for i in range(n):
        yield i*i

Nothing is generated until we iterate:

print(squares(5000000000000))
for sq in squares(5):
    print_nobreak(sq)
<generator object squares at 0x7f2c475a8a50>
0 1 4 9 16 

Coroutines

Even if millions of values are generated, there is never a large amount of memory used because they are generated one-at-a-time. Unless we explicitly store them, of course:

print(list(squares(5)))
[0, 1, 4, 9, 16]

Coroutines

Python code to replicate the Haskell hailstone example:

from typing import Callable, TypeVar, Generator, Iterable
T = TypeVar('T')

def hailstone(n: int) -> int:
    return n // 2 if n % 2 == 0 else 3*n + 1

def iterate(f: Callable[[T],T], x: T) -> Generator[T,None,None]:
    while True:
        yield x
        x = f(x)

def take(n: int, values: Iterable[T]) -> Iterable[T]:
    for _, v in zip(range(n), values):
        yield v

first_hail = take(3, iterate(hailstone, 31))
print(list(first_hail))

Coroutines

Traversing a binary tree becomes very easy:

def traverse(t):
    if t is None:
        return
    else:
        yield from traverse(t.left)
        yield t.value
        yield from traverse(t.right)

What if we want to compare to binary trees to see if they have the same contents? [Almost the same fringe problem.]

Coroutines

Then we can work through those iterators to see if two trees have the same contents:

def same_contents(t1, t2):
    contents1 = traverse(t1)
    contents2 = traverse(t2)
    while True:
        v1 = next(contents1, None)
        v2 = next(contents2, None)
        if v1 != v2:
            return False
        elif v1 is None and v2 is None:
            return True

While same_contents is running, two copies of traverse are generating values cooperatively.

Coroutines

Or we could do the same thing in a more functional style:

import operator
from itertools import zip_longest, starmap
from functools import reduce
def same_contents(t1, t2):
    contents1 = traverse(t1)
    contents2 = traverse(t2)
    pairwise = zip_longest(contents1, contents2)
    each_equal = starmap(operator.eq, pairwise)
    return reduce(operator.and_, each_equal)

[Both implementations assume the tree doesn't contain the value None.]

Coroutines

Technically, generators like this are semicoroutines: communication is only happening one-way.

It's also possible to communicate into the generator to create a true coroutine.

Different languages have different capabilities.

Coroutines

Usually, coroutine is used to refer to only single-threaded code: only one of the coroutines is doing something at any moment.

It's not too much of a difference to allow multiple coroutines to work at the same time, and thus use multiple cores. Coroutines in Kotlin can run in parallel *.

Undefined Behaviour

It's possible that certain things you can express in a programming language have no specified behaviour in the language definition. In that case, anything the compiler does is correct.

This is particularly common in C/C++: it is seen as an opportunity for the optimizer to have more freedom.

Undefined Behaviour

Sometimes, you might expect this, like accessing a value past the end of an array.

int past_bounds() {
    int arr[] = {1,2,3,4,5};
    return arr[5];
}

Possible results: segfault, value from that memory location, error message, always zero, ….

Undefined Behaviour

Sometimes it might surprise you, because behaviour changes when the optimizer notices the undefined behaviour.

e.g. in C, integers are not guaranteed to have two's complement behaviour: adding one to INT_MAX isn't defined. You might expect to get INT_MIN, but C doesn't promise that.

Undefined Behaviour

e.g. this code seems to rely on INT_MAX+1 == INT_MIN:

int wraparound(int n) {
    if ( n+1 < n ) {
        return 0;
    }
    return n;
}

int test_wrap() {
    cout << wraparound(4) << endl;
    cout << wraparound(INT_MAX) << endl;
}

Undefined Behaviour

Compiled with clang++ -O0:

4
0

Compiled with clang++ -O2:

4
2147483647

Both of these are correct compilations of this code. [Compiler Explorer]

Undefined Behaviour

Most modern languages don't have undefined behaviour in their spec.

But you have to be careful in C and C++ to avoid it: you can't predict what the compiler will do with it.