Languages: Functions

We have see functions in many languages (or similar concepts with different names: methods or procedures).

The basic ideas are the same: a function is a piece of logic that takes input (arguments) and returns a result and may have side-effects.

First-Class Functions

A language has first-class functions if a function can be treated like any other value.

For example, can be passed as an argument, stored in a list, created at runtime, returned from a function.

First-Class Functions

We have seen this in Haskell where functions can do anything any other values can do.

result = foldl operation 0 [1,2,3,4,5]
    where operation a x = 2*a + x

Here, foldl is getting three arguments: a function, an integer, and a list. The function is just as good as any other argument.

First-Class Functions

First-class functions are often useful: most languages have them. It's sometimes useful to treat functions as values. In Python:

all_sorts = [quicksort, mergesort, heapsort]
for sort in all_sorts:
    result = sort(array)
    assert is_sorted(result)

First-Class Functions

An implementation of filter and an anonymouns (lambda) function in JavaScript (like Array.filter):

function filter(predicate, array) {
    var result = []
    for (let x of array) {
        if ( predicate(x) ) {
            result.push(x)
        }
    }
    return result
}
res = filter((x) => { return x%2==0 }, [1,2,3,4,5,6])

First-Class Functions

In Ruby, building a lambda function at runtime; a closure (more later); passing a function as an argument to .select().

def div_by(n)
    return lambda {|m| m%n == 0}
end
print [1,2,3,4,5,6].select(&div_by(2))

First-Class Functions

Most languages do some of the first-class function things.

C can pass and store references to functions: you can work with function pointers like any other pointers. It's not possible to create a function at runtime or partially apply functions.

C++ added lambda expressions in C++11.

First-Class Functions

C++ lambda expressions might be used with the very-functional-looking algorithm library. Both of these calls to std::transform would add one to all of the values in the vector in-place (assuming add_one is defined),

std::vector<int> v{ 10, 20, 30, 40 };
std::transform(v.cbegin(), v.cend(), v.begin(), add_one);
std::transform(v.cbegin(), v.cend(), v.begin(), [](int n) {
    return n + 1;
});

First-Class Functions

In C++, the functional-style tools with lambda functions are extremely efficient: these two implementations take exactly the same running time.

std::transform(v.cbegin(), v.cend(), v.begin(), [](int n) {
    return n + 1;
});

for (int i = 0; i < LENGTH; i++) {
    v[i] += 1;
}

First-Class Functions

Java has always had the Runnable interface: if you need to pass/return a function, you can actually use an instance of Runnable and .run() gets called as the function (with no arguments).

Java added lambda expressions in Java 8. They create objects that are instances of a functional interface.

Runnable r = () -> System.out.println("Hello world");
r.run();                             // prints "Hello world"
Predicate<Integer> isEven = n -> n % 2 == 0;
System.out.println(isEven.test(3));  // prints "false"

First-Class Functions

Lambda expressions are a sign that a language has a lot of first-class function features: you can create a function when you need to and call/store it. Also called anonymous functions.

Anonymous functions are very common in JavaScript:

res = filter((x) => { return x%2==0 }, [1,2,3,4,5,6])
res = filter(function(x) { return x%2==0 }, [1,2,3,4,5,6])

The first argument to filter is an anonymous function that takes one argument, x (in new and old syntax).

First-Class Functions

In Python, lambda expressions can define a function, but only out of one expression (not multiple statements).

is_even = lambda m: m%2 == 0
print(is_even(3))              # prints "False"

But named functions can be defined in other functions.

def demo_nested_function():
    def is_even(m):
        remainder = m%2
        return remainder == 0
    print(is_even(3))          # prints "False"

Closures

Why would you want to define a function inside another function? Probably to create a closure…

Closures

Consider this function that builds a function:

function divisibility_checker(n) {
    var div_check = function(m) { return m%n == 0 }
    return div_check
}

It can then be used like:

var is_even = divisibility_checker(2)
evens = filter(is_even, [10,11,12,13,14])
console.log(evens)   // outputs [10, 12, 14]

How did the argument n survive until is_even was called? It should be out of scope by then.

Closures

A closure is a function (or class or other structure) with free variables bound to the scope where it was created.

Reminder: free variable, a variable that isn't a function argument or local variable, but comes from outside.

A closure lets you construct a function/class that uses a value that you can't pass in as an argument.

Closures

In Haskell, we can do a closure in a function defined in a let/where. e.g. one of my implementations of join uses a closure over sep in the where:

join _ [] = []
join sep strs = foldl1 withSep strs
    where withSep acc s = acc ++ sep ++ s

Closures

In C++, the variables for the closure are given explicitly (in the lambda expression's []):

#include <functional>
std::function<bool (int)> divisibility_checker(int n) {
    auto div_check = [n](int m) { return m%n == 0; };
    return div_check;
}

This can be used similarly to the previous Python example:

auto is_even = divisibility_checker(2);
cout << is_even(7) << '\n';

Closures

Closures can also be used in many languages to create a class. e.g. in Python to create a list subclass that can't be popped below a certain size:

def MinSizeList(n):
    class MSL(list):    # the list class, but...
        def pop(self):  # with .pop() overridden
            if len(self) <= n:
                raise IndexError("Min size is %i" % (n,))
            return super().pop()
    return MSL          # return the new class

Closures

Then this can be used like the list class, but with an override:

List2 = MinSizeList(2)     # dynamically-generated class
l = List2([0, 10, 20, 30])
print(l.pop())
print(l.pop())
print(l.pop())

30
20
IndexError: Min size is 2

This works even if you're passing the object to other code that you don't control and .pop() is called.

Callbacks

When functions are passable as arguments, they can be used to control program flow: a function can be called to process results.

The most obvious use: a function that can handle work some time in the future, continuation-passing style.

Callbacks

e.g. a Ruby Rack web server. Whenever a request comes in, the callback function (technically Ruby Proc object) is called.

require 'rack'

req_handler = Proc.new do |env|
    ['200',
        {'Content-Type' => 'text/plain; charset=utf-8'},
        ['Hello world!\n']
    ]
end

Rack::Handler::WEBrick.run req_handler

Callbacks

Callbacks give a way for library code to call your code. Callback-heavy code starts to feel like it's event-driven.

JavaScript is probably the place you see callbacks most often. They allow other logic to run while waiting for a slow I/O operation, in a single thread (non-blocking I/O).

Callbacks

e.g. request data from the server, and provide an anonymous callback function to handle the response when it's available (with jQuery):

$.ajax({
    url: '/data/url',
    dataType: 'json',
    success: function(data) {
        console.log(data);
    }
});

Callbacks

Callbacks are often a useful place for a closure: when you need some data in the callback that isn't passed as an argument when it's called, a closure might be the only way to get it there.

function setup_clickables(id) {
    $('.clickable').click(function () {
        doSomething(id);
    });
}

Coroutines

The concept of lazy evaluation was (sometimes) useful in Haskell. It would be nice to have a similar concept in strictly-evaluated languages.

Coroutines are function that can suspend their execution and work together with other coroutines to get work done.

Coroutines

Because of the lazy evaluation in Haskell, something like this was actually happening all the time.

take 3 (iterate hailstone 31)

If we evaluate take, it has to get the first element from its last argument; take suspends and iterate starts; it suspends and hailstone starts; hailstone actually returns and iterate resumes and produces the first list item needed by take; then take resumes; … .

Coroutines

So, take, iterate, and hailstone were all operating cooperatively. They aren't called coroutines because the language is making the decisions, not the programmer.

In some languages, there are constructs where functions can suspend/resume when they have or need results from a cooperating coroutine.

Coroutines

Any Python function that uses yield immediately returns a generator object, which the calling code can then iterate through; the function will resume to create each value.

def squares(n: int) -> Iterable[int]:
    for i in range(n):
        yield i*i

Nothing is generated until we iterate:

print(squares(5000000000000))
for sq in squares(5):
    print(sq, end=' ')

<generator object squares at 0x7faaa8290ac0>
0 1 4 9 16

Coroutines

Even if millions of values are generated, there is never a large amount of memory used because they are generated one-at-a-time. Unless we explicitly store them, of course:

print(list(squares(5)))

[0, 1, 4, 9, 16]

Coroutines

Python code to replicate the Haskell hailstone example:

from typing import Callable, TypeVar, Generator, Iterable
T = TypeVar('T')

def hailstone(n: int) -> int:
    return n // 2 if n % 2 == 0 else 3*n + 1

def iterate(f: Callable[[T],T], x: T) -> Generator[T,None,None]:
    while True:
        yield x
        x = f(x)

def take(n: int, values: Iterable[T]) -> Iterable[T]:
    for _, v in zip(range(n), values):
        yield v

first_hail = take(3, iterate(hailstone, 31))
print(list(first_hail))

Coroutines

Traversing a binary tree becomes very easy:

def traverse(t):
    if t is None:
        return
    else:
        yield from traverse(t.left)
        yield t.value
        yield from traverse(t.right)

What if we want to compare to binary trees to see if they have the same contents? [Almost the same fringe problem.]

Coroutines

Then we can work through those iterators to see if two trees have the same contents:

def same_contents(t1, t2):
    contents1 = traverse(t1)
    contents2 = traverse(t2)
    while True:
        v1 = next(contents1, None)
        v2 = next(contents2, None)
        if v1 != v2:
            return False
        elif v1 is None and v2 is None:
            return True

While same_contents is running, two copies of traverse are generating values cooperatively.

Coroutines

Or we could do the same thing in a more functional style:

import operator
from itertools import zip_longest, starmap
from functools import reduce
def same_contents(t1, t2):
    contents1 = traverse(t1)
    contents2 = traverse(t2)
    pairwise = zip_longest(contents1, contents2)
    each_equal = starmap(operator.eq, pairwise)
    return reduce(operator.and_, each_equal)

[Both implementations assume the tree doesn't contain the value None.]

Coroutines

Technically, generators like this are semicoroutines: communication is only happening one-way.

It's also possible to communicate into the generator to create a true coroutine.

Different languages have different capabilities.

Coroutines

Usually, coroutine is used to refer to only single-threaded code: only one of the coroutines is doing something at any moment.

It's not too much of a difference to allow multiple coroutines to work at the same time, and thus use multiple cores. Coroutines in Kotlin can run in parallel *.

Undefined Behaviour

It's possible that certain things you can express in a programming language have no specified behaviour in the language definition. In that case, anything the compiler does is correct.

This is particularly common in C/C++: it is seen as an opportunity for the optimizer to have more freedom.

Undefined Behaviour

Sometimes, you might expect this, like accessing a value past the end of an array.

int past_bounds() {
    int arr[] = {1,2,3,4,5};
    return arr[5];
}

Possible results: segfault, value from that memory location, error message, always zero, ….

Undefined Behaviour

Sometimes it might surprise you, because behaviour changes when the optimizer notices the undefined behaviour.

e.g. in C, integers are not guaranteed to have two's complement behaviour: adding one to INT_MAX isn't defined. You might expect to get INT_MIN, but C doesn't promise that.

Undefined Behaviour

e.g. this code seems to rely on INT_MAX+1 == INT_MIN:

int wraparound(int n) {
    if ( n+1 < n ) {
        return 0;
    }
    return n;
}

int test_wrap() {
    cout << wraparound(4) << endl;
    cout << wraparound(INT_MAX) << endl;
}

Undefined Behaviour

Compiled with clang++ -O0:

4
0

Compiled with clang++ -O2:

4
2147483647

Both of these are correct compilations of this code. [Compiler Explorer]

Undefined Behaviour

Most modern languages don't have (much) undefined behaviour in their spec.

But you have to be careful in C and C++ to avoid it: you can't predict what the compiler will do with it.