The question here: does the compiler know a value's type when it compiles the program?
For statically typed languages, the answer is yes: C, Java, C#, Dart, etc.
Where the types aren't known until runtime, the language is dynamically typed: Python, JavaScript, Lisp, Ruby, etc.
There are some where it's not clear: Haskell, Scala. Consider this Haskell code:
calculate :: Num a => a -> a -> a calculate x y = x*y + x
Are the *
and +
doing arithmetic on integers or floating point or complex numbers or …? It depends:
*Main> let a = 5 :: Int *Main> let b = 7.0 :: Double *Main> :t calculate a 5 calculate a 5 :: Int *Main> :t calculate b 5 calculate b 5 :: Double
Static typing allows more to be checked at compile time.
e.g. Is the expression
okay? Yes if a/b
a
and b
are floating point; no if they are strings.
This allows the compiler to catch more programmer errors. In a statically typed language, you can have more confidence that the types are right.
Dynamic typing allows more flexibility and often less code.
The programmer doesn't have to explicitly declare/allocate/type variables, which saves keystrokes/effort.
Polymorphism is easy in a dynamically typed language. e.g. this code will work on any three values where +
can work: integers, floats, strings, etc.
def add_three(a, b, c): return a + b + c
In C, Java, Fortran, that would have to be different functions for each input/output type.*
Remember: there are some related things that always have to be checked at runtime: array bounds, division by zero, etc.
Static typing can catch more things at compile-time, but not every type-related error.
Of course, this is related to static/dynamic typing, but here the question is: when do we know which operator/function/definition applies?
Statically bound: the compiler knows exactly which operation is going to happen.
Dynamically bound: the details have to be determined at run-time.
Consider this Java code:
String s; double d; ⋮ System.out.print(s); System.out.print(d);
There are separate implementations of print
for different types, but the compiler knows which one to choose. i.e. the calls are bound to a particular implementation at compile time.
Consider similar Python:
a = 6 b = 7 ⋮ c = a + b
The compiler doesn't really know which +
that is: maybe integer addition, but maybe the types change and it's string concatenation.
The types must be checked and operator bound at runtime: dynamic binding.
Dynamic binding provides greater flexibility: the same code can work on integers, and floats, and any other type where the function/operators used are defined
But it comes with a speed penalty: types must be checked every time a line of code is executed.
Static binding avoids this run-time overhead since the compiler can make the right bindings before execution even starts.
The speed difference can be huge: the difference between (1) executing the processor's ADD
instruction and (2) a type check, symbol table lookup, method call, then do the ADD
.
Let's look back at the Mandelbrot benchmark.
The speed difference between Cython's time on dynamically-typed .py
code and (partially-)statically-typed .pyx
code is 36 times.
It wasn't compiling to machine code that made the code faster (Cython compiled both through C), but removing the overhead from dynamic binding (in this case at least).
The static/dynamic distinction can be seen in Cython's output: it compiles Python + optional type declarations to C (and then machine code). It will produce annotated input/output with Python and corresponding C. Notes:
escapes
is statically-typed: the +
operator turns into a C +
operator, and can be statically bound to an operation on double
.mandel
is dynamically-typed: +
turns into a PyNumber_Add
call that (presumably) does dynamic type checks and is therefore slower.Also note: in the Cython .pyx
code, only the escapes
function that takes most of the time got the static type declarations. The mandel
and toChar
function didn't.
The running time is essentially identical to C, Go, Rust. Optimizing is often a waste of time, but optimizing the inner-most loop might be worth it. Profile before you optimize.
Dynamic binding is another place where JITs can help.
If you always call a function with arguments of a specific type, then the JIT can notice and compile a statically-bound version of the function and use it with logic like:
if types match previous calls: call statically-typed machine code else: interpret dynamically-typed bytecode, or maybe compile a version for the new types.
Maybe you get a statically-typed implementation of your dynamically-typed code.
The Mandelbrot benchmark again: PyPy takes the dynamically-typed Python code and gets performance similar to statically-typed Cython (and better than statically-typed Java).
Conclusion without proof: it must be compiling statically-bound versions of the code to get that performance.
Dynamic binding can also happen in a statically-typed language.
Some statically-typed languages have features that require a type-related decision at run-time.
Consider this Java:
class Animal { void reward() {…} } class Dog Extends Animal { void reward() {…} } class SomeOtherClass { void someMethod(Animal a) { a.reward(); } }
Which .reward()
is being called?
The .reward()
call might be to Animal.reward()
or Dog.reward()
, depending on the value passed (as a
). The decision on that line of code has to happen at run-time.
In analogous C++ code, the .reward()
always refers to Animal.reward()
because that's the declared type: it insists on statically binding.
Dynamic typing lets one implementation of a function work on many types of input, but at the cost of dynamic binding.
def add_three(a, b, c): return a + b + c
Another option many languages include is generics, where one implementation can be instantiated for many different input types.
We saw this in Haskell, with typeclasses letting us specify restrictions on the types we used.
theSame :: (Eq a) => a -> a -> Bool theSame x y = x==y
This will work on any arguments in the Eq
typeclass, and Haskell can statically-compile any specialized version it needs.
Another example: in Go, interfaces define a collection of methods a struct must have.
type LengthHaver interface { GetLength() float64 ScaledLength(float64) float64 }
Then the compiler automatically knows that anything that has those methods implements that interface: no declaration necessary.
type Rectangle struct {…} func (r Rectangle) GetLength() float64 {…} func (r Rectangle) ScaledLength(f float64) float64 {…}
This implements the LengthHaver
interface, even though the person who implemented Rectangle
might have no idea.
This can be used as an instance of that interface, and the compiler will work out what implementations to use.
var l LengthHaver l = Rectangle{5, 6} fmt.Println(l.GetLength())
Does Go statically or dynamically bind interface implementations? It's complicated, but usually static.
Other languages have similar features: protocols in Swift, interfaces in Java, abstract classes C++ (and concepts in C++20).
Obviously these are useful. And they allow polymorphic code, without the runtime cost of dynamic typing/binding.
We saw type inference in Haskell: it could determine the types of values by examining known types of return values and literals. e.g.
res = (length some_list) + 1
The known types imply that res
is an Int
:
length xs :: Int
1 :: Num a => a
(+) :: Num a => a -> a -> a
In general, type inference or implicit typing is done in several languages. The compiler must know the types of some values in an expression, or return values of functions.
If the compiler can infer types, it can statically bind in its output.
It seems like programmers really want least some type inference. It was in C# from the start. It was added in C++11 (and enhanced in C++17):
double add_one(double x) { auto y = x + 1; // y inferred to be type double return y; }
And to Java in version 10, where variable types can be inferred from the initializer:
var x = 1.0; // type inferred as double from literal var values = new ArrayList<Double>(); // type inferred as ArrayList<Double> from constructor
Analogous Rust code can do the same type inference (on variable y
).
fn add_one(x: f64) -> f64 { let y = x + 1.0; return y }
Rust can do more advanced type inference, feeling a little like Haskell: example from Rust docs
pub fn main() { let elem = 5u8; // typed literal implies type let mut vec = Vec::new(); // vector of type ??? vec.push(elem); // oh, vector contains type u8 println!("{:?}", vec); }
All of these languages can do static binding at compile-time: they know the types even if you didn't tell them.
That way the programmer doesn't have to specify types everywhere, which is repetitive and boring.
Python 3 generally does not do type inference (or any static typing), but the programmer can give type hints about what kind of values will be used:
def double(x: float) -> float: return x+x print(double(4)) print(double("abc"))
i.e. the argument and return type of f
are both float
values: the first call should be fine, but the second has the wrong types. Strangely, it still runs:
8 abcabc
There is a separate tool Mypy that checks the types. The command mypy mypy-demo.py
would produce:
mypy-demo.py:5: error: Argument 1 to "double" has incompatible type "str"; expected "float"
Type checking and static binding are not done when compiling. They are only used for this static check, and as hints for IDEs.
Mypy does type inference (and warns if there's a place in the code where it can't, and needs hints):
from typing import Dict def to_dict(x: int, y: str) -> Dict[int, float]: x1 = x+1 return {x1: y}
mypy-demo.py:10: error: Dict entry 0 has incompatible type "int": "str"; expected "int": "float"
[i.e. you promised to return a dictionary that maps int
to float
, but actually have int
to str
.]
Dynamic typing (and binding) is generally slower, but more flexible.
The usual style is to not check types, just use the properties/methods/operators that you expect to be there: treat the value like the kind of object you expect.
Duck typing: If it looks like a duck and quacks like a duck, then it must be a duck.
e.g. you may write a function to count how many times a particular integer occurs in a list:
def count_occurrences(lst, val): count = 0 for v in lst: if v == val: count += 1 return count
But this will actually work with any other object that can be iterated (with for
) and has values that can be checked for equality (with ==
). Includes characters in a string, lines in a file, or some other custom-built collection.
Notable duck-typed languages: Python, Ruby, Perl, C# (sometimes, with dynamic
and member lookup).
Contrast: interfaces, Haskell type classes, etc. These provide similar flexibility, but with partially-specified explicit types.
In Python, it's considered bad style to explicitly check a type: without the check, this code would have worked on any iterable collection.
def count_occurrences_ugly(lst, val): assert isinstance(lst, list) count = 0 for v in lst: if v == val: count += 1 return count
It is very easy to be too restrictive with the Python hints:
from typing import List def count_too_specific(lst: List[int], val: int) -> int: # ⋮
Better: express clearly what the code can do:
from typing import Iterable, TypeVar T = TypeVar('T') def count_well_hinted(lst: Iterable[T], val: T) -> int: # ⋮
Maybe this is manual duck typing? You express the types that are good enough
. Or maybe it's like Haskell's type classes + type variables.
Duck typing often goes with dynamic typing/binding.
Which you like is largely a matter of taste.
To what extent does the language prevent you from making type errors? Type error: treating a value as an incompatible type.
e.g. calling obj.someMethod()
when that method doesn't exist.
e.g. applying the /
operator to a string.
All languages have some type checking, but can we avoid/fool these checks?
e.g. Java checks most things at compile time, but this compiles (but warns and throws a runtime exception):
List values = new ArrayList(); Integer i = 6; String s; values.add((Object) i); s = (String) values.get(0);
[That is considered very bad Java style. Should have been an ArrayList<Integer>
which can be checked.]
This is seen as a feature in C, but can be dangerous:
unsigned int vals[2] = {0, 1081602048}; double *dp = (double *) vals; printf("%f\n", *dp);
Output:
383.000000
The terms strongly typed
and weakly typed
are used to describe more or less type safety, but seem to be used differently by everybody. Rough definitions:
Strongly typed: every value has a single well-defined type, so you can do lots of checking.
Weakly typed: single values can be treated as different types, so type checking is harder.
C and Java are mostly strongly typed since values are explicitly statically typed, but pointer/reference casting allows the programmer to do weakly-typed things.
Python is strongly typed since the type of each value is tracked by the language (but at runtime since it's dynamically typed).
Some languages (Perl, PHP, JavaScript) will implicitly convert values when necessary (type coercion), so they feel more weakly typed. All display 6
:
print "12"/2
print("12"/2);
console.log("12"/2)
This can be very confusing for programmers.
On the other extreme, Go types are very strongly checked: most operations happen on a specific type, and if you want to mix types in a calculation, you have to explicitly convert in the direction you want.
Things that fail in Go:
var i int = 12 var x float32 = 3.4 fmt.Println(i * x) // mismatched types int and float32 if i { fmt.Println("yes") } // cannot convert int to type bool
Code that would run:
var i int = 12 var x float32 = 3.4 fmt.Println(float32(i) * x) fmt.Println(i * int(x)) if i != 0 { fmt.Println("yes") }
Go code must be extremely precise in the types it's using.
A mutable object/value is one that can be modified after its initial creation. Most OO languages have mutable objects: internal state can be modified by methods or by assigning public properties. e.g. in Java:
var list = new ArrayList<Integer>(); // empty list list.add(6); // now length 1
Or most objects in Python objects can have class attributes modified:
pt = Point2D(3, 5) # 2D point (3,5) pt.x = 4 # now represents (4,5)
Immutable: objects/values that aren't mutable.
e.g. Integer
objects in Java or C#; strings in Java, C#, Python; tuples in Python.
A variable containing an immutable object can still be changed by assigning a new instance:
string direction = "up"; direction = "down";
Important distinction: the variable refers to a new object, not the same object with different contents. Any other references (aliases) to the old object haven't changed.
Immutable objects are useful when you need to know an object won't change.
Examples where immutability might be important:
But mutable objects are often much easier to work with when data structures do need to change. Some examples:
Some languages have ways to enforce mutability.
private
fields and careful class design.const
variables, functions, and arguments.readonly
fields are immutable.Even if your language doesn't have any mutability guarantees, it's still something that should be documented.
from typing import List def append1(lst: List[int], elt: int) -> None: ''' Append elt to the lst. Modifies lst in-place. ''' lst.append(elt) def append2(lst: List[int], elt: int) -> List[int]: ''' Return a copy of lst with elt appended to it. ''' return lst + [elt]
In Rust, the concept of mutability is front-and-centre. Variables are declared as mutable or not:
let mut count = 0; count += 1; println!("count = {}", count);
Without the mut
keyword here, compilation fails on the second line: cannot assign twice to immutable variable
.
We will work more with Rust's mutability restrictions later in the course.
The data your program is using is stored in the computer's memory. Each program has a certain amount of memory allocated to it, which can be expanded or contracted as necessary.
This memory is generally divided into three parts:
static
.malloc
in C; new
in C++, Java, C#; all objects in Python, Ruby.The usual diagram of what's conceptually happening:
Stack and heap grow and shrink as necessary.
The stack contains info on function calls: arguments and locally-defined variables for each instance. Values are popped when the function returns.
In dynamic languages, usually everything is on the heap (but there is a call stack for functions).
As a programmer, how do you control the memory you're using? What is stored and when can it be thrown away? The goal: make sure we only store the data we need, not keep around old stuff: a memory leak.
How we do that depends on the language.
Stack memory is fairly obvious: when the function returns, pop its stuff off the stack. Every language (except maybe assembly) takes care of this.
But the stack is often only a small amount of the memory being used: in most OO languages, the stack variables are just object references(/pointers) and actual object contents are on the heap.
Heap memory is harder, because it's hard to know when a program is no longer using a value.
Values are still useful as long as there is a pointer/reference to them. How do we know when the last reference is gone?
In C, keeping track of allocated heap memory is the programmer's problem. What is allocated must be freed by somebody, exactly once.
/* array of 100 int on the heap: */ int *arr = (int *)malloc(100*sizeof(int)); arr[17] = 10; printf("%i\n", arr[17]); free(arr); /* free() must be called to not leak. */
In C++, objects can be on the stack: those are destroyed when the function returns.
void stack_object_example() { Point2D pt1 = Point2D(1, 2); cout << pt1 << '\n'; }
Objects created with new
are on the heap and must be delete
d.
void heap_object_example() { Point2D *pt2 = new Point2D(3, 4); cout << *pt2 << '\n'; delete(pt2); }
When pointers are passed around, it can be unclear who owns
them and is responsible for deleting.
Point2D *create_object_pointer() { Point2D *pt = new Point2D(5, 6); return pt; }
Some code far away must delete()
.
Point2D *pt3 = create_object_pointer(); cout << *pt3 << '\n'; delete(pt3);
That's extremely error-prone. In C++11, memory management was modernized with smart pointers like unique_ptr<T>
that is a pointer available in exactly one scope.
unique_ptr<Point2D> create_unique_ptr() { auto pt = make_unique<Point2D>(7, 8); return pt; // implicitly give up ownership by returning }
When the unique_ptr
is destroyed, it deletes its contents.
auto pt4 = create_unique_ptr(); cout << *pt4 << '\n';
No need to delete(pt4)
: it happens automatically. Ownership can be explicitly given away (moved) if other code needs the object.
auto pt5 = create_unique_ptr(); function_that_uses_a_point2d(move(pt5)); // cout << *pt5 << '\n';
Calling move()
gives away ownership of the pointer, giving ownership to the function (so it's deleted when the function returns).
After that, using *pt5
(outside the function) would fail: this code gave away ownership of the object so it's not ours to work with.
The result: very little work on the programmer's part, but it's very hard to have a memory leak.
You really should be using smart pointers in modern C++.
See also shared_ptr
that has reference counting semantics (more later).
Also compare ownership in Rust, which we will discuss more later.
C and C++ (and old Objective-C) are the only modern languages where memory is managed manually, and the trend is definitely away from doing so.
It's just too hard to free
/delete
perfectly 100% of the time. If you don't, your program will slowly use more-and-more memory over time.
What are the alternatives?
It would be nice if the language would handle the freeing of memory for us.
Basic observation: if there are no references left to an object, it can be freed.
But, having references doesn't mean an object will actually be used again. The programmer should still explicitly delete (release
, del
, etc) when necessary, especially large objects with long-lived references.
A garbage collector is part of a language's runtime environment that looks for objects on the heap that can no longer be accessed (garbage) an frees them (after calling their finalizers, in languages that have the concept).
Garbage collection…
There are several garbage collection algorithms.
The programmer needs to know that the language has garbage collection. An implementation of the language might choose any algorithm; different implementations of the same language might have different strategies.
Reference counting garbage collection keeps track of the number of references to each object. When the number decreases to zero, delete. Can't handle cyclic data structures; requires space and time to maintain the counters.
Tracing garbage collection looks for which objects are reachable from references available in the program: everything else is garbage. There are many strategies to do this quickly and without stopping execution while it happens.
If using only C++ smart unique_ptr
s (and friends), there isn't any need for garbage collection at runtime.
Assuming we keep the unique_ptr
on the stack, it will be deleted as appropriate. When the unique_ptr
is deleted, it will automatically delete the object it refers to: the pointer was unique, so reference counting
is easy and can be done at compile-time.
Rust does the same and ensures memory safety by having explicit ownership of memory. (More later.)
There is no garbage collector, but the compiler can determine when a value is no longer needed (when there are no more references to it) and free the heap memory.
Explicitly-tracked ownership presents another option that could be considered a kind of automatic memory management
. So, we have
programmer-time?),
A language has first-class functions if a function can be treated like any other value.
For example, can be passed as an argument, stored in a list, created at runtime, returned from a function.
We have seen this in Haskell where functions can do anything any other values can do.
result = foldl operation 0 [1,2,3,4,5] where operation a x = 2*a + x
Here, foldl
is getting three arguments: a function, an integer, and a list. The function is just as good as any other argument.
First-class functions are often useful: most newer (post-1990s) languages have them. It's sometimes useful to treat functions as values. In Python:
all_sorts = [quicksort, mergesort, heapsort] for sort in all_sorts: result = sort(array) assert is_sorted(result)
An implementation of filter
and an anonymouns (lambda) function in JavaScript (like Array.filter
):
function filter(predicate, array) { var result = [] for (let x of array) { if ( predicate(x) ) { result.push(x) } } return result } res = filter((x) => { return x%2==0 }, [1,2,3,4,5,6])
… with some bonus duck typing.
In Ruby, building a lambda
function at runtime; a closure (more later); passing a function as an argument to .select()
.
def div_by(n) return lambda {|m| m%n == 0} end print [1,2,3,4,5,6].select(&div_by(2))
Most languages do some of the first-class function
things.
C can pass and store references to functions: you can work with function pointers like any other pointers. It's not possible to create a function at runtime or partially apply functions.
C++ added lambda expressions in C++11.
Java has always had the Runnable
interface: if you need to pass/return a function
, you can actually use an instance of Runnable
and .run()
gets called as the function
(with no arguments).
Java added lambda expressions in Java 8. They create objects that are instances of a functional interface.
Runnable r = () -> System.out.println("Hello world"); r.run(); // prints "Hello world" Predicate<Integer> isEven = n -> n % 2 == 0; System.out.println(isEven.test(3)); // prints "false"
Lambda expressions are a sign that a language has a lot of first-class function features: you can create a function when you need to and call/store it. Also called anonymous functions.
Anonymous functions are very common in JavaScript:
res = filter((x) => { return x%2==0 }, [1,2,3,4,5,6]) res = filter(function(x) { return x%2==0 }, [1,2,3,4,5,6])
The first argument to filter
is an anonymous function that takes one argument, x
(in new and old syntax).
In Python, lambda expressions can define a function, but only out of one expression (not multiple statements).
is_even = lambda m: m%2 == 0 print(is_even(3)) # prints "False"
But named functions can be defined in other functions.
def demo_nested_function(): def is_even(m): remainder = m%2 return remainder == 0 print(is_even(3)) # prints "False"
Why would you want to define a function inside another function? Probably to create a closure
…
Consider this function that builds a function:
function divisibility_checker(n) { var div_check = function(m) { return m%n == 0 } return div_check }
It can then be used like:
var is_even = divisibility_checker(2) evens = filter(is_even, [10,11,12,13,14]) console.log(evens) // outputs [10, 12, 14]
How did the argument n
survive until is_even
was called? It should be out of scope by then.
A closure is a function (or class or other structure) with free variables bound to the scope where it was created.
Reminder: free variable, a variable that isn't a function argument or local variable, but comes from outside.
A closure lets you construct a function/class that uses a value that you can't pass in as an argument.
In Haskell, we can do a closure in a function defined in a let
/where
. e.g. one of my implementations of join
uses a closure over sep
in the where
:
join _ [] = [] join sep strs = foldl1 withSep strs where withSep acc s = acc ++ sep ++ s
In C++, the variables for the closure are given explicitly (in the lambda expression's []
):
#include <functional> std::function<bool (int)> divisibility_checker(int n) { auto div_check = [n](int m) { return m%n == 0; }; return div_check; }
This can be used similarly to the previous Python example:
auto is_even = divisibility_checker(2); cout << is_even(7) << '\n';
Closures can also be used in many languages to create a class. e.g. in Python to create a list subclass that can't be popped below a certain size:
def MinSizeList(n): class MSL(list): # the list class, but... def pop(self): # with .pop() overridden if len(self) <= n: raise IndexError("Min size is %i" % (n,)) return super().pop() return MSL # return the new class
Then this can be used like the list
class, but with an override:
List2 = MinSizeList(2) # dynamically-generated class l = List2([0, 10, 20, 30]) print(l.pop()) print(l.pop()) print(l.pop())
30 20 IndexError: Min size is 2
This works even if you're passing the object to other code that you don't control and .pop()
is called.
When functions are passable as arguments, they can be used to control program flow: a function can be called to process results.
The most obvious use: a function that can handle work some time in the future, continuation-passing style.
e.g. a Ruby Rack web server. Whenever a request comes in, the callback function (technically Ruby Proc
object) is called.
require 'rack' req_handler = Proc.new do |env| ['200', {'Content-Type' => 'text/plain; charset=utf-8'}, ['Hello world!\n'] ] end Rack::Handler::WEBrick.run req_handler
Callbacks give a way for library code to call your code. Callback-heavy code starts to feel like it's event-driven.
JavaScript is probably the place you see callbacks most often. They allow other logic to run while waiting for a slow I/O operation, in a single thread (non-blocking I/O).
e.g. request data from the server, and provide an anonymous callback function to handle the response when it's available (with jQuery):
$.ajax({ url: '/data/url', dataType: 'json', success: function(data) { console.log(data); } });
Callbacks are often a useful place for a closure: when you need some data in the callback that isn't passed as an argument when it's called, a closure might be the only way to get it there.
function setup_clickables(id) { $('.clickable').click(function () { doSomething(id); }); }
The concept of lazy evaluation was (sometimes) useful in Haskell. It would be nice to have a similar concept in strictly-evaluated languages.
Coroutines are function that can suspend their execution and work together with other coroutines to get work done.
Because of the lazy evaluation in Haskell, something like this was actually happening all the time.
take 3 (iterate hailstone 31)
If we evaluate take
, it has to get the first element from its last argument; take
suspends and iterate
starts; it suspends and hailstone
starts; hailstone
actually returns and iterate
resumes and produces the first list item needed by take
; then take
resumes; … .
So, take
, iterate
, and hailstone
were all operating cooperatively. They aren't called coroutines
because the language is making the decisions, not the programmer.
In some languages, there are constructs where functions can suspend/resume when they have or need results from a cooperating coroutine.
Any Python function that uses yield
immediately returns a generator object, which the calling code can then iterate through; the function will resume to create each value.
def squares(n: int) -> Iterable[int]: for i in range(n): yield i*i
Nothing is generated until we iterate:
print(squares(5000000000000)) for sq in squares(5): print(sq, end=' ')
<generator object squares at 0x7faaa8290ac0> 0 1 4 9 16
Even if millions of values are generated, there is never a large amount of memory used because they are generated one-at-a-time. Unless we explicitly store them, of course:
print(list(squares(5)))
[0, 1, 4, 9, 16]
Python code to replicate the Haskell hailstone example:
from typing import Callable, TypeVar, Generator, Iterable T = TypeVar('T') def hailstone(n: int) -> int: return n // 2 if n % 2 == 0 else 3*n + 1 def iterate(f: Callable[[T],T], x: T) -> Generator[T,None,None]: while True: yield x x = f(x) def take(n: int, values: Iterable[T]) -> Iterable[T]: for _, v in zip(range(n), values): yield v first_hail = take(3, iterate(hailstone, 31)) print(list(first_hail))
Traversing a binary tree becomes very easy:
def traverse(t): if t is None: return else: yield from traverse(t.left) yield t.value yield from traverse(t.right)
What if we want to compare to binary trees to see if they have the same contents? [Almost the same fringe problem
.]
Then we can work through those iterators to see if two trees have the same contents:
def same_contents(t1, t2): contents1 = traverse(t1) contents2 = traverse(t2) while True: v1 = next(contents1, None) v2 = next(contents2, None) if v1 != v2: return False elif v1 is None and v2 is None: return True
While same_contents
is running, two copies of traverse
are generating values cooperatively.
Or we could do the same thing in a more functional style:
import operator from itertools import zip_longest, starmap from functools import reduce def same_contents(t1, t2): contents1 = traverse(t1) contents2 = traverse(t2) pairwise = zip_longest(contents1, contents2) each_equal = starmap(operator.eq, pairwise) return reduce(operator.and_, each_equal)
[Both implementations assume the tree doesn't contain the value None
.]
Technically, generators like this are semicoroutines: communication is only happening one-way.
It's also possible to communicate into the generator to create a true
coroutine.
Different languages have different capabilities.
Usually, coroutine
is used to refer to only single-threaded code: only one of the coroutines is doing something at any moment.
It's not too much of a difference to allow multiple coroutines to work at the same time, and thus use multiple cores. Coroutines in Kotlin can run in parallel *.
It's possible that certain things you can express in a programming language have no specified behaviour in the language definition. In that case, anything the compiler does is correct
.
This is particularly common in C/C++: it is seen as an opportunity for the optimizer to have more freedom.
Sometimes, you might expect this, like accessing a value past the end of an array.
int past_bounds() { int arr[] = {1,2,3,4,5}; return arr[5]; }
Possible results: segfault, value from that memory location, error message, always zero, ….
Sometimes it might surprise you, because behaviour changes when the optimizer notices the undefined behaviour.
e.g. in C, integers are not guaranteed to have two's complement behaviour: adding one to INT_MAX
isn't defined. You might expect to get INT_MIN
, but C doesn't promise that.
e.g. this code seems to rely on INT_MAX+1 == INT_MIN
:
int wraparound(int n) { if ( n+1 < n ) { return 0; } return n; } int test_wrap() { cout << wraparound(4) << endl; cout << wraparound(INT_MAX) << endl; }
Compiled with clang++ -O0
:
4 0
Compiled with clang++ -O2
:
4 2147483647
Both of these are correct compilations of this code. [Compiler Explorer]
Most modern languages don't have undefined behaviour in their spec.
But you have to be careful in C and C++ to avoid it: you can't predict what the compiler will do with it.