We have see functions in many languages (or similar concepts with different names: methods or procedures).
The basic ideas are the same: a function is a piece of logic that takes input (arguments) and returns a result and may have side-effects.
A language has first-class functions if a function can be treated like any other value.
For example, can be passed as an argument, stored in a list, created at runtime, returned from a function.
We have seen this in Haskell where functions can do anything any other values can do.
result = foldl operation 0 [1,2,3,4,5] where operation a x = 2*a + x
Here, foldl
is getting three arguments: a function, an integer, and a list. The function is just as good as any other argument.
First-class functions are often useful: most languages have them. It's sometimes useful to treat functions as values. In Python:
all_sorts = [quicksort, mergesort, heapsort] for sort in all_sorts: result = sort(array) assert is_sorted(result)
An implementation of filter
and an anonymouns (lambda) function in JavaScript (like Array.filter
):
function filter(predicate, array) { var result = [] for (let x of array) { if ( predicate(x) ) { result.push(x) } } return result } res = filter((x) => { return x%2==0 }, [1,2,3,4,5,6])
In Ruby, building a lambda
function at runtime; a closure (more later); passing a function as an argument to .select()
.
def div_by(n) return lambda {|m| m%n == 0} end print [1,2,3,4,5,6].select(&div_by(2))
Most languages do some of the first-class function
things.
C can pass and store references to functions: you can work with function pointers like any other pointers. It's not possible to create a function at runtime or partially apply functions.
C++ added lambda expressions in C++11.
C++ lambda expressions might be used with the very-functional-looking algorithm library. Both of these calls to std::transform
would add one to all of the values in the vector in-place (assuming add_one
is defined),
std::vector<int> v{ 10, 20, 30, 40 }; std::transform(v.cbegin(), v.cend(), v.begin(), add_one); std::transform(v.cbegin(), v.cend(), v.begin(), [](int n) { return n + 1; });
In C++, the functional-style tools with lambda functions are extremely efficient: these two implementations take exactly the same running time.
std::transform(v.cbegin(), v.cend(), v.begin(), [](int n) { return n + 1; });
for (int i = 0; i < LENGTH; i++) { v[i] += 1; }
Java has always had the Runnable
interface: if you need to pass/return a function
, you can actually use an instance of Runnable
and .run()
gets called as the function
(with no arguments).
Java added lambda expressions in Java 8. They create objects that are instances of a functional interface.
Runnable r = () -> System.out.println("Hello world"); r.run(); // prints "Hello world" Predicate<Integer> isEven = n -> n % 2 == 0; System.out.println(isEven.test(3)); // prints "false"
Lambda expressions are a sign that a language has a lot of first-class function features: you can create a function when you need to and call/store it. Also called anonymous functions.
Anonymous functions are very common in JavaScript:
res = filter((x) => { return x%2==0 }, [1,2,3,4,5,6]) res = filter(function(x) { return x%2==0 }, [1,2,3,4,5,6])
The first argument to filter
is an anonymous function that takes one argument, x
(in new and old syntax).
In Python, lambda expressions can define a function, but only out of one expression (not multiple statements).
is_even = lambda m: m%2 == 0 print(is_even(3)) # prints "False"
But named functions can be defined in other functions.
def demo_nested_function(): def is_even(m): remainder = m%2 return remainder == 0 print(is_even(3)) # prints "False"
Why would you want to define a function inside another function? Probably to create a closure
…
Consider this function that builds a function:
function divisibility_checker(n) { var div_check = function(m) { return m%n == 0 } return div_check }
It can then be used like:
var is_even = divisibility_checker(2) evens = filter(is_even, [10,11,12,13,14]) console.log(evens) // outputs [10, 12, 14]
How did the argument n
survive until is_even
was called? It should be out of scope by then.
A closure is a function (or class or other structure) with free variables bound to the scope where it was created.
Reminder: free variable, a variable that isn't a function argument or local variable, but comes from outside.
A closure lets you construct a function/class that uses a value that you can't pass in as an argument.
In Haskell, we can do a closure in a function defined in a let
/where
. e.g. one of my implementations of join
uses a closure over sep
in the where
:
join _ [] = [] join sep strs = foldl1 withSep strs where withSep acc s = acc ++ sep ++ s
In C++, the variables for the closure are given explicitly (in the lambda expression's []
):
#include <functional> std::function<bool (int)> divisibility_checker(int n) { auto div_check = [n](int m) { return m%n == 0; }; return div_check; }
This can be used similarly to the previous Python example:
auto is_even = divisibility_checker(2); cout << is_even(7) << '\n';
Closures can also be used in many languages to create a class. e.g. in Python to create a list subclass that can't be popped below a certain size:
def MinSizeList(n): class MSL(list): # the list class, but... def pop(self): # with .pop() overridden if len(self) <= n: raise IndexError("Min size is %i" % (n,)) return super().pop() return MSL # return the new class
Then this can be used like the list
class, but with an override:
List2 = MinSizeList(2) # dynamically-generated class l = List2([0, 10, 20, 30]) print(l.pop()) print(l.pop()) print(l.pop())
30 20 IndexError: Min size is 2
This works even if you're passing the object to other code that you don't control and .pop()
is called.
When functions are passable as arguments, they can be used to control program flow: a function can be called to process results.
The most obvious use: a function that can handle work some time in the future, continuation-passing style.
e.g. a Ruby Rack web server. Whenever a request comes in, the callback function (technically Ruby Proc
object) is called.
require 'rack' req_handler = Proc.new do |env| ['200', {'Content-Type' => 'text/plain; charset=utf-8'}, ['Hello world!\n'] ] end Rack::Handler::WEBrick.run req_handler
Callbacks give a way for library code to call your code. Callback-heavy code starts to feel like it's event-driven.
JavaScript is probably the place you see callbacks most often. They allow other logic to run while waiting for a slow I/O operation, in a single thread (non-blocking I/O).
e.g. request data from the server, and provide an anonymous callback function to handle the response when it's available (with jQuery):
$.ajax({ url: '/data/url', dataType: 'json', success: function(data) { console.log(data); } });
Callbacks are often a useful place for a closure: when you need some data in the callback that isn't passed as an argument when it's called, a closure might be the only way to get it there.
function setup_clickables(id) { $('.clickable').click(function () { doSomething(id); }); }
The concept of lazy evaluation was (sometimes) useful in Haskell. It would be nice to have a similar concept in strictly-evaluated languages.
Coroutines are function that can suspend their execution and work together with other coroutines to get work done.
Because of the lazy evaluation in Haskell, something like this was actually happening all the time.
take 3 (iterate hailstone 31)
If we evaluate take
, it has to get the first element from its last argument; take
suspends and iterate
starts; it suspends and hailstone
starts; hailstone
actually returns and iterate
resumes and produces the first list item needed by take
; then take
resumes; … .
So, take
, iterate
, and hailstone
were all operating cooperatively. They aren't called coroutines
because the language is making the decisions, not the programmer.
In some languages, there are constructs where functions can suspend/resume when they have or need results from a cooperating coroutine.
Any Python function that uses yield
immediately returns a generator object, which the calling code can then iterate through; the function will resume to create each value.
def squares(n: int) -> Iterable[int]: for i in range(n): yield i*i
Nothing is generated until we iterate:
print(squares(5000000000000)) for sq in squares(5): print(sq, end=' ')
<generator object squares at 0x7faaa8290ac0> 0 1 4 9 16
Even if millions of values are generated, there is never a large amount of memory used because they are generated one-at-a-time. Unless we explicitly store them, of course:
print(list(squares(5)))
[0, 1, 4, 9, 16]
Python code to replicate the Haskell hailstone example:
from typing import Callable, TypeVar, Generator, Iterable T = TypeVar('T') def hailstone(n: int) -> int: return n // 2 if n % 2 == 0 else 3*n + 1 def iterate(f: Callable[[T],T], x: T) -> Generator[T,None,None]: while True: yield x x = f(x) def take(n: int, values: Iterable[T]) -> Iterable[T]: for _, v in zip(range(n), values): yield v first_hail = take(3, iterate(hailstone, 31)) print(list(first_hail))
Traversing a binary tree becomes very easy:
def traverse(t): if t is None: return else: yield from traverse(t.left) yield t.value yield from traverse(t.right)
What if we want to compare to binary trees to see if they have the same contents? [Almost the same fringe problem
.]
Then we can work through those iterators to see if two trees have the same contents:
def same_contents(t1, t2): contents1 = traverse(t1) contents2 = traverse(t2) while True: v1 = next(contents1, None) v2 = next(contents2, None) if v1 != v2: return False elif v1 is None and v2 is None: return True
While same_contents
is running, two copies of traverse
are generating values cooperatively.
Or we could do the same thing in a more functional style:
import operator from itertools import zip_longest, starmap from functools import reduce def same_contents(t1, t2): contents1 = traverse(t1) contents2 = traverse(t2) pairwise = zip_longest(contents1, contents2) each_equal = starmap(operator.eq, pairwise) return reduce(operator.and_, each_equal)
[Both implementations assume the tree doesn't contain the value None
.]
Technically, generators like this are semicoroutines: communication is only happening one-way.
It's also possible to communicate into the generator to create a true
coroutine.
Different languages have different capabilities.
Usually, coroutine
is used to refer to only single-threaded code: only one of the coroutines is doing something at any moment.
It's not too much of a difference to allow multiple coroutines to work at the same time, and thus use multiple cores. Coroutines in Kotlin can run in parallel *.
It's possible that certain things you can express in a programming language have no specified behaviour in the language definition. In that case, anything the compiler does is correct
.
This is particularly common in C/C++: it is seen as an opportunity for the optimizer to have more freedom.
Sometimes, you might expect this, like accessing a value past the end of an array.
int past_bounds() { int arr[] = {1,2,3,4,5}; return arr[5]; }
Possible results: segfault, value from that memory location, error message, always zero, ….
Sometimes it might surprise you, because behaviour changes when the optimizer notices the undefined behaviour.
e.g. in C, integers are not guaranteed to have two's complement behaviour: adding one to INT_MAX
isn't defined. You might expect to get INT_MIN
, but C doesn't promise that.
e.g. this code seems to rely on INT_MAX+1 == INT_MIN
:
int wraparound(int n) { if ( n+1 < n ) { return 0; } return n; } int test_wrap() { cout << wraparound(4) << endl; cout << wraparound(INT_MAX) << endl; }
Compiled with clang++ -O0
:
4 0
Compiled with clang++ -O2
:
4 2147483647
Both of these are correct compilations of this code. [Compiler Explorer]
Most modern languages don't have (much) undefined behaviour in their spec.
But you have to be careful in C and C++ to avoid it: you can't predict what the compiler will do with it.