Languages: Types

In the last section, we talked about how the programmer describes logic in a programming language. Here, we'll talk about another big aspect of the design of a language: what kinds of values can you store/manipulate in your code?

Languages: Types

Every language (most?) have several basic types of data that can be stored in variables (or used in calculations, passed to functions, etc).

e.g. booleans, integers (8-, 16-, 32-, 64-bit; signed, unsigned), floating point (32-, 64-bit), characters.

Languages: Types

Also generally some compound types that can hold several values. e.g.

generally n values of the same type: strings, arrays, lists, sets
generally values of different but fixed types: tuples, structs
generally one of several possible types/values: enums, unions

Languages: Types

Object-oriented languages introduce types (classes) that contain both data (fields/attributes/properties) and functions that operate on those values (methods).

Languages: Types

Remember that types carry information about/restrictions on the value, as well as the value itself.

e.g. in C, a pointer might be identical to a long unsigned in memory, but it's a different type in the language. The language will let you dereference one but not the other.

e.g. for normal Rust integers (i64), overflowing will cause a panic. The wrapping types (Wrapping<i64>) will wrap around, two's complement-style.

Languages: Types

We should expect all values in a programming language to have a well-defined type. For example, this code works with a floating-point value, and a string:

x = 3.0
x = "abc"

In some languages, variables have a clearly-defined type, but in some (like Python) a variable could contain different types at different points in the code.

Languages: Types

But we could have a situation where it's not clear when looking at some code what the types are.

def print_twice(x):
    print(x + x)

What's the type of x? It depends how the function is called.

print_twice(5)
print_twice("abc")

Static/Dynamic Typing

A question: does the compiler know a variable or value's type when it compiles the program?

For statically typed languages, the answer is yes: C, Java, C#, Dart, etc.

Where the types aren't known until runtime, the language is dynamically typed: Python, JavaScript, Lisp, Ruby, etc.

Static/Dynamic Typing

There are some where it's not clear: Haskell, Scala. Consider this Haskell code:

calculate :: Num a => a -> a -> a
calculate x y = x*y + x

Are the * and + doing arithmetic on integers or floating point or complex numbers or …? It depends:

*Main> let a = 5   :: Int
*Main> let b = 7.0 :: Double
*Main> :t calculate a 5
calculate a 5 :: Int
*Main> :t calculate b 5
calculate b 5 :: Double

Static/Dynamic Typing

Static typing allows more to be checked at compile time.

e.g. Is the expression a/b okay? Yes if a and b are floating point; no if they are strings.

This allows the compiler to catch more programmer errors. In a statically typed language, you can have more confidence that the types are right.

Static/Dynamic Typing

Dynamic typing allows more flexibility and often less code.

The programmer doesn't have to explicitly declare/allocate/type variables, which saves keystrokes/effort.

Static/Dynamic Typing

Polymorphism is easy in a dynamically typed language. e.g. this code will work on any three values where + can work: integers, floats, strings, etc.

def add_three(a, b, c):
    return a + b + c

In C, Java, Fortran, that would have to be different functions for each input/output type.*

Static/Dynamic Typing

Remember: there are some related things that always have to be checked at runtime: array bounds, division by zero, overflow, etc.

Static typing can catch more things at compile-time, but not every type-related error.

Gradual Typing

In between static and dynamic typing, some languages have gradual typing, where values/variables can be given static types that can be checked at compile-time, but it's optional.

This lets programmers be explicit about types and ensure their correctness where (they think) it is most important.

Gradual Typing

For example, in TypeScript, these both declare a string variable:

let message1 = "hello"
let message2: string = "hello"

The type declaration lets the compiler catch type errors.

let message3 = 7          // okay
let message4: string = 7  // compiler error

Static/Dynamic Binding

Of course, this is related to static/dynamic typing, but here the question is: when do we know which operator/function/definition applies?

Statically bound: the compiler knows exactly which operation is going to happen.

Dynamically bound: the details have to be determined at run-time.

Static/Dynamic Binding

Consider this Java code:

String s;
double d;
⋮
System.out.print(s);
System.out.print(d);

There are separate implementations of print for different types, but the compiler knows which one to choose. i.e. the calls are bound to a particular implementation at compile time.

Static/Dynamic Binding

Consider similar Python:

a = 6
b = 7
⋮
c = a + b

The compiler doesn't really know which + that is: maybe integer addition, but maybe the types change and it's string concatenation.

The types must be checked and operator bound at runtime: dynamic binding.

Static/Dynamic Binding

Dynamic binding provides greater flexibility: the same code can work on integers, and floats, and any other type where the function/operators used are defined

But it comes with a speed penalty: types must be checked every time a line of code is executed.

Static/Dynamic Binding

Static binding avoids this run-time overhead since the compiler can make the right bindings before execution even starts.

The speed difference can be huge: the difference between (1) executing the processor's ADD instruction and (2) a type check, symbol table lookup, method call, then do the ADD.

Static/Dynamic Binding

Let's look back at the Mandelbrot benchmark at Cython (which is gradually-typed).

The speed difference between Cython's time on dynamically-typed .py code and (partially-)statically-typed .pyx code is 36 times.

It wasn't compiling to machine code that made the code faster (Cython compiled both through C), but removing the overhead from dynamic binding (in this case at least).

Static/Dynamic Binding

The static/dynamic distinction can be seen in Cython's output: it compiles Python + optional type declarations to C (and then machine code). It will produce annotated input/output with Python and corresponding C. Notes:

escapes is statically-typed: the + operator turns into a C + operator, and can be statically bound to an operation on double.
mandel is dynamically-typed: + turns into a PyNumber_Add call that (presumably) does dynamic type checks and is therefore slower.

Static/Dynamic Binding

Also note: in the Cython .pyx code, only the escapes function that takes most of the time got the static type declarations. The mandel and toChar function didn't.

The running time is essentially identical to C, Go, Rust. Optimizing is often a waste of time, but optimizing the inner-most loop might be worth it. Profile before you optimize.

Static/Dynamic Binding

Dynamic binding is another place where JITs can help.

If you always call a function with arguments of a specific type, then the JIT can notice and compile a statically-bound version of the function and use it with logic like:

if types match previous calls:
    call statically-typed machine code
else:
    interpret dynamically-typed bytecode,
    or maybe compile a version for the new types.

Maybe you get a statically-typed implementation of your dynamically-typed code.

Static/Dynamic Binding

The Mandelbrot benchmark again: PyPy takes the dynamically-typed Python code and gets performance similar to statically-typed Cython (and better than statically-typed Java).

Conclusion without proof: it must be compiling statically-bound versions of the code to get that performance.

Static/Dynamic Binding

Dynamic binding can also happen in a statically-typed language.

Some statically-typed languages have features that require a type-related decision at run-time.

Static/Dynamic Binding

Consider this Java:

class Animal {
    void reward() {…}
}
class Dog Extends Animal {
    void reward() {…}
}
class SomeOtherClass {
    void someMethod(Animal a) {
        a.reward();
    }
}

Which .reward() is being called?

Static/Dynamic Binding

The .reward() call might be to Animal.reward() or Dog.reward(), depending on the value passed (as a). The decision on that line of code has to happen at run-time.

In analogous C++ code, the .reward() always refers to Animal.reward() because that's the declared type: it insists on statically binding.

Generics & Polymorphism

Dynamic typing lets one implementation of a function work on many types of input, but at the cost of dynamic binding.

def add_three(a, b, c):
    return a + b + c

Another option many languages include is generics, where one implementation can be instantiated for many different input types.

Generics & Polymorphism

We saw this in Haskell, with typeclasses letting us specify restrictions on the types we used.

theSame :: (Eq a) => a -> a -> Bool
theSame x y = x==y

This will work on any arguments in the Eq typeclass, and Haskell can statically-compile any specialized version it needs.

Generics & Polymorphism

Another example: in Go, interfaces define a collection of methods a struct must have.

type LengthHaver interface {
	GetLength() float64
	ScaledLength(float64) float64
}

Generics & Polymorphism

Then the compiler automatically knows that anything that has those methods implements that interface: no declaration necessary.

type Rectangle struct {…}
func (r Rectangle) GetLength() float64 {…}
func (r Rectangle) ScaledLength(f float64) float64 {…}

This implements the LengthHaver interface, even though the person who implemented Rectangle might have no idea.

Generics & Polymorphism

This can be used as an instance of that interface, and the compiler will work out what implementations to use.

var l LengthHaver
l = Rectangle{5, 6}
fmt.Println(l.GetLength())

Does Go statically or dynamically bind interface implementations? It's complicated, but usually static.

Generics & Polymorphism

Other languages have similar features: protocols in Swift, interfaces in Java, abstract classes C++ (and concepts in C++20).

Obviously these are useful. And they allow polymorphic code, without the runtime cost of dynamic typing/binding.

Type Inference

We saw type inference in Haskell: it could determine the types of values by examining known types of return values and literals. e.g.

res = (length some_list) + 1

The known types imply that res is an Int:

length xs :: Int
1 :: Num a => a
(+) :: Num a => a -> a -> a

Type Inference

In general, type inference or implicit typing is done in several languages. The compiler must know the types of some values in an expression, or return values of functions.

If the compiler can infer types, it can statically bind in its output.

Type Inference

It seems like programmers really want least some type inference. It was in C# from the start. It was added in C++11 (and enhanced in C++17):

double add_one(double x) {
    auto y = x + 1;  // y inferred to be type double
    return y;
}

And to Java in version 10, where variable types can be inferred from the initializer:

var x = 1.0;  // type inferred as double from literal
var values = new ArrayList<Double>();
    // type inferred as ArrayList<Double> from constructor

Type Inference

Analogous Rust code can do the same type inference (on variable y).

fn add_one(x: f64) -> f64 {
    let y = x + 1.0;
    return y
}

Rust can do more advanced type inference, feeling a little like Haskell: example from Rust docs

pub fn main() {
    let elem = 5u8;           // typed literal implies type
    let mut vec = Vec::new(); // vector of type ???
    vec.push(elem);           // oh, vector contains type u8
    println!("{:?}", vec);
}

Type Inference

All of these languages can do static binding at compile-time: they know the types even if you didn't tell them.

That way the programmer doesn't have to specify types everywhere, which is repetitive and boring.

Type Inference

Python 3 generally does not do type inference (or any static typing), but the programmer can give type hints about what kind of values will be used:

def double(x: float) -> float:
    return x+x

print(double(4))
print(double("abc"))

i.e. the argument and return type of f are both float values: the first call should be fine, but the second has the wrong types. Strangely, it still runs:

8
abcabc

Type Inference

There is a separate tool Mypy that checks the types. The command mypy mypy-demo.py would produce:

mypy-demo.py:5: error: Argument 1 to "double" has incompatible
type "str"; expected "float"

Type checking and static binding are not done when compiling. They are only used for this static check, and as hints for IDEs.

Type Inference

Mypy does type inference (and warns if there's a place in the code where it can't, and needs hints):

from typing import Dict
def to_dict(x: int, y: str) -> Dict[int, float]:
    x1 = x+1
    return {x1: y}

mypy-demo.py:10: error: Dict entry 0 has incompatible type
"int": "str"; expected "int": "float"

[i.e. you promised to return a dictionary that maps int to float, but actually have int to str.]

Type Checking

To what extent does the language prevent you from making type errors? Type error: treating a value as an incompatible type.

e.g. calling obj.someMethod() when that method doesn't exist.

e.g. applying the / operator to a string.

Type Checking

All languages have some type checking, but can we avoid/fool these checks?

e.g. Java checks most things at compile time, but this compiles (but warns and throws a runtime exception):

List values = new ArrayList();
Integer i = 6;
String s;
values.add((Object) i);
s = (String) values.get(0);

[That is considered very bad Java style. Should have been an ArrayList<Integer> which can be checked.]

Type Checking

This is seen as a feature in C, but can be dangerous:

unsigned int vals[2] = {0, 1081602048};
double *dp = (double *) vals;
printf("%f\n", *dp);

Output:

383.000000

Type Checking

The terms strongly typed and weakly typed are used to describe more or less type safety, but seem to be used differently by everybody. Rough definitions:

Strongly typed: every value has a single well-defined type, so you can do lots of checking.

Weakly typed: single values can be treated as different types, so type checking is harder.

Type Checking

C and Java are mostly strongly typed since values are explicitly statically typed, but pointer/reference casting allows the programmer to do weakly-typed things.

Python is strongly typed since the type of each value is tracked by the language (but at runtime since it's dynamically typed).

Type Checking

Some languages (Perl, PHP, JavaScript) will implicitly convert values when necessary (type coercion), so they feel more weakly typed. All display 6:

print "12"/2

print("12"/2);

console.log("12"/2)

This can be very confusing for programmers.

Type Checking

On the other extreme, Go types are very strongly checked: most operations happen on a specific type, and if you want to mix types in a calculation, you have to explicitly convert in the direction you want.

Things that fail in Go:

var i int = 12
var x float32 = 3.4
fmt.Println(i * x)  // mismatched types int and float32
if i { fmt.Println("yes") } // cannot convert int to type bool

Type Checking

Code that would run:

var i int = 12
var x float32 = 3.4
fmt.Println(float32(i) * x)
fmt.Println(i * int(x))
if i != 0 { fmt.Println("yes") }

Go code must be extremely precise in the types it's using.

Mutable/Immutable Data

A mutable object/value is one that can be modified after its initial creation. Most OO languages have mutable objects: internal state can be modified by methods or by assigning public properties. e.g. in Java:

var list = new ArrayList<Integer>();   // empty list
list.add(6);                           // now length 1

Or most objects in Python objects can have class attributes modified:

pt = Point2D(3, 5)   # 2D point (3,5)
pt.x = 4             # now represents (4,5)

Mutable/Immutable Data

Immutable: objects/values that aren't mutable.

e.g. Integer objects in Java or C#; strings in Java, C#, Python; tuples in Python.

Mutable/Immutable Data

A variable containing an immutable object can still be changed by assigning a new instance:

string direction = "up";
direction = "down";

Important distinction: the variable refers to a new object, not the same object with different contents. Any other references (aliases) to the old object haven't changed.

Immutable objects are useful when you need to know an object won't change.

Mutable/Immutable Data

Examples where immutability might be important:

In a dictionary/map/hash table key: the data structure needs to be able to check the value when it's inserted and know it doesn't change while in the collection. (e.g. it always has the same hash value)
As an argument: a mutable object could be modified whenever it is given as an argument to a function. Immutability ensures it won't be.
Sharing between threads: safe for multiple threads to access it if it can't be changed.

Mutable/Immutable Data

But mutable objects are often much easier to work with when data structures do need to change. Some examples:

Collections: don't want to rebuild a list of a million elements just to add one more.
State: state of a game (or whatever) changes frequently and could be shared by many parts of the code.
We want to pass an object to a function and have it modified.

Mutable/Immutable Data

Some languages have ways to enforce mutability.

private fields and careful class design.
C++ can mark and enforce mutability restrictions with const variables, functions, and arguments.
In C#, readonly fields are immutable.
Haskell had neither mutability nor the ability to re-assign: that made any state difficult.

Mutable/Immutable Data

Even if your language doesn't have any mutability guarantees, it's still something that should be documented.

from typing import List

def append1(lst: List[int], elt: int) -> None:
    """
    Append elt to the lst. Modifies lst in-place.
    """
    lst.append(elt)

def append2(lst: List[int], elt: int) -> List[int]:
    """
    Return a copy of lst with elt appended to it.
    """
    return lst + [elt]

Mutable/Immutable Data

In Rust, the concept of mutability is front-and-centre. Variables are declared as mutable or not:

let mut count = 0;
count += 1;
println!("count = {}", count);

Without the mut keyword here, compilation fails on the second line: cannot assign twice to immutable variable.

We will work more with Rust's mutability restrictions later in the course.