In the last section, we talked about how the programmer describes logic in a programming language. Here, we'll talk about another big aspect of the design of a language: what kinds of values can you store/manipulate in your code?
Every language (most?) have several basic types of data that can be stored in variables (or used in calculations, passed to functions, etc).
e.g. booleans, integers (8-, 16-, 32-, 64-bit; signed, unsigned), floating point (32-, 64-bit), characters.
Also generally some compound types that can hold several values. e.g.
Object-oriented languages introduce types (classes) that contain both data (fields/attributes/properties) and functions that operate on those values (methods).
Remember that types carry information about/restrictions on the value, as well as the value itself.
e.g. in C, a pointer might be identical to a long unsigned
in memory, but it's a different type in the language. The language will let you dereference one but not the other.
e.g. for normal Rust integers (i64
), overflowing will cause a panic. The wrapping types (Wrapping<i64>
) will wrap around, two's complement-style.
We should expect all values in a programming language to have a well-defined type. For example, this code works with a floating-point value, and a string:
x = 3.0 x = "abc"
In some languages, variables have a clearly-defined type, but in some (like Python) a variable could contain different types at different points in the code.
But we could have a situation where it's not clear when looking at some code what the types are.
def print_twice(x): print(x + x)
What's the type of x
? It depends how the function is called.
print_twice(5) print_twice("abc")
A question: does the compiler know a variable or value's type when it compiles the program?
For statically typed languages, the answer is yes: C, Java, C#, Dart, etc.
Where the types aren't known until runtime, the language is dynamically typed: Python, JavaScript, Lisp, Ruby, etc.
There are some where it's not clear: Haskell, Scala. Consider this Haskell code:
calculate :: Num a => a -> a -> a calculate x y = x*y + x
Are the *
and +
doing arithmetic on integers or floating point or complex numbers or …? It depends:
*Main> let a = 5 :: Int *Main> let b = 7.0 :: Double *Main> :t calculate a 5 calculate a 5 :: Int *Main> :t calculate b 5 calculate b 5 :: Double
Static typing allows more to be checked at compile time.
e.g. Is the expression
okay? Yes if a/b
a
and b
are floating point; no if they are strings.
This allows the compiler to catch more programmer errors. In a statically typed language, you can have more confidence that the types are right.
Dynamic typing allows more flexibility and often less code.
The programmer doesn't have to explicitly declare/allocate/type variables, which saves keystrokes/effort.
Polymorphism is easy in a dynamically typed language. e.g. this code will work on any three values where +
can work: integers, floats, strings, etc.
def add_three(a, b, c): return a + b + c
In C, Java, Fortran, that would have to be different functions for each input/output type.*
Remember: there are some related things that always have to be checked at runtime: array bounds, division by zero, overflow, etc.
Static typing can catch more things at compile-time, but not every type-related error.
In between static and dynamic typing, some languages have gradual typing, where values/variables can be given static types that can be checked at compile-time, but it's optional.
This lets programmers be explicit about types and ensure their correctness where (they think) it is most important.
For example, in TypeScript, these both declare a string variable:
let message1 = "hello" let message2: string = "hello"
The type declaration lets the compiler catch type errors.
let message3 = 7 // okay let message4: string = 7 // compiler error
Of course, this is related to static/dynamic typing, but here the question is: when do we know which operator/function/definition applies?
Statically bound: the compiler knows exactly which operation is going to happen.
Dynamically bound: the details have to be determined at run-time.
Consider this Java code:
String s; double d; ⋮ System.out.print(s); System.out.print(d);
There are separate implementations of print
for different types, but the compiler knows which one to choose. i.e. the calls are bound to a particular implementation at compile time.
Consider similar Python:
a = 6 b = 7 ⋮ c = a + b
The compiler doesn't really know which +
that is: maybe integer addition, but maybe the types change and it's string concatenation.
The types must be checked and operator bound at runtime: dynamic binding.
Dynamic binding provides greater flexibility: the same code can work on integers, and floats, and any other type where the function/operators used are defined
But it comes with a speed penalty: types must be checked every time a line of code is executed.
Static binding avoids this run-time overhead since the compiler can make the right bindings before execution even starts.
The speed difference can be huge: the difference between (1) executing the processor's ADD
instruction and (2) a type check, symbol table lookup, method call, then do the ADD
.
Let's look back at the Mandelbrot benchmark at Cython (which is gradually-typed).
The speed difference between Cython's time on dynamically-typed .py
code and (partially-)statically-typed .pyx
code is 36 times.
It wasn't compiling to machine code that made the code faster (Cython compiled both through C), but removing the overhead from dynamic binding (in this case at least).
The static/dynamic distinction can be seen in Cython's output: it compiles Python + optional type declarations to C (and then machine code). It will produce annotated input/output with Python and corresponding C. Notes:
escapes
is statically-typed: the +
operator turns into a C +
operator, and can be statically bound to an operation on double
.mandel
is dynamically-typed: +
turns into a PyNumber_Add
call that (presumably) does dynamic type checks and is therefore slower.Also note: in the Cython .pyx
code, only the escapes
function that takes most of the time got the static type declarations. The mandel
and toChar
function didn't.
The running time is essentially identical to C, Go, Rust. Optimizing is often a waste of time, but optimizing the inner-most loop might be worth it. Profile before you optimize.
Dynamic binding is another place where JITs can help.
If you always call a function with arguments of a specific type, then the JIT can notice and compile a statically-bound version of the function and use it with logic like:
if types match previous calls: call statically-typed machine code else: interpret dynamically-typed bytecode, or maybe compile a version for the new types.
Maybe you get a statically-typed implementation of your dynamically-typed code.
The Mandelbrot benchmark again: PyPy takes the dynamically-typed Python code and gets performance similar to statically-typed Cython (and better than statically-typed Java).
Conclusion without proof: it must be compiling statically-bound versions of the code to get that performance.
Dynamic binding can also happen in a statically-typed language.
Some statically-typed languages have features that require a type-related decision at run-time.
Consider this Java:
class Animal { void reward() {…} } class Dog Extends Animal { void reward() {…} } class SomeOtherClass { void someMethod(Animal a) { a.reward(); } }
Which .reward()
is being called?
The .reward()
call might be to Animal.reward()
or Dog.reward()
, depending on the value passed (as a
). The decision on that line of code has to happen at run-time.
In analogous C++ code, the .reward()
always refers to Animal.reward()
because that's the declared type: it insists on statically binding.
Dynamic typing lets one implementation of a function work on many types of input, but at the cost of dynamic binding.
def add_three(a, b, c): return a + b + c
Another option many languages include is generics, where one implementation can be instantiated for many different input types.
We saw this in Haskell, with typeclasses letting us specify restrictions on the types we used.
theSame :: (Eq a) => a -> a -> Bool theSame x y = x==y
This will work on any arguments in the Eq
typeclass, and Haskell can statically-compile any specialized version it needs.
Another example: in Go, interfaces define a collection of methods a struct must have.
type LengthHaver interface { GetLength() float64 ScaledLength(float64) float64 }
Then the compiler automatically knows that anything that has those methods implements that interface: no declaration necessary.
type Rectangle struct {…} func (r Rectangle) GetLength() float64 {…} func (r Rectangle) ScaledLength(f float64) float64 {…}
This implements the LengthHaver
interface, even though the person who implemented Rectangle
might have no idea.
This can be used as an instance of that interface, and the compiler will work out what implementations to use.
var l LengthHaver l = Rectangle{5, 6} fmt.Println(l.GetLength())
Does Go statically or dynamically bind interface implementations? It's complicated, but usually static.
Other languages have similar features: protocols in Swift, interfaces in Java, abstract classes C++ (and concepts in C++20).
Obviously these are useful. And they allow polymorphic code, without the runtime cost of dynamic typing/binding.
We saw type inference in Haskell: it could determine the types of values by examining known types of return values and literals. e.g.
res = (length some_list) + 1
The known types imply that res
is an Int
:
length xs :: Int
1 :: Num a => a
(+) :: Num a => a -> a -> a
In general, type inference or implicit typing is done in several languages. The compiler must know the types of some values in an expression, or return values of functions.
If the compiler can infer types, it can statically bind in its output.
It seems like programmers really want least some type inference. It was in C# from the start. It was added in C++11 (and enhanced in C++17):
double add_one(double x) { auto y = x + 1; // y inferred to be type double return y; }
And to Java in version 10, where variable types can be inferred from the initializer:
var x = 1.0; // type inferred as double from literal var values = new ArrayList<Double>(); // type inferred as ArrayList<Double> from constructor
Analogous Rust code can do the same type inference (on variable y
).
fn add_one(x: f64) -> f64 { let y = x + 1.0; return y }
Rust can do more advanced type inference, feeling a little like Haskell: example from Rust docs
pub fn main() { let elem = 5u8; // typed literal implies type let mut vec = Vec::new(); // vector of type ??? vec.push(elem); // oh, vector contains type u8 println!("{:?}", vec); }
All of these languages can do static binding at compile-time: they know the types even if you didn't tell them.
That way the programmer doesn't have to specify types everywhere, which is repetitive and boring.
Python 3 generally does not do type inference (or any static typing), but the programmer can give type hints about what kind of values will be used:
def double(x: float) -> float: return x+x print(double(4)) print(double("abc"))
i.e. the argument and return type of f
are both float
values: the first call should be fine, but the second has the wrong types. Strangely, it still runs:
8 abcabc
There is a separate tool Mypy that checks the types. The command mypy mypy-demo.py
would produce:
mypy-demo.py:5: error: Argument 1 to "double" has incompatible type "str"; expected "float"
Type checking and static binding are not done when compiling. They are only used for this static check, and as hints for IDEs.
Mypy does type inference (and warns if there's a place in the code where it can't, and needs hints):
from typing import Dict def to_dict(x: int, y: str) -> Dict[int, float]: x1 = x+1 return {x1: y}
mypy-demo.py:10: error: Dict entry 0 has incompatible type "int": "str"; expected "int": "float"
[i.e. you promised to return a dictionary that maps int
to float
, but actually have int
to str
.]
To what extent does the language prevent you from making type errors? Type error: treating a value as an incompatible type.
e.g. calling obj.someMethod()
when that method doesn't exist.
e.g. applying the /
operator to a string.
All languages have some type checking, but can we avoid/fool these checks?
e.g. Java checks most things at compile time, but this compiles (but warns and throws a runtime exception):
List values = new ArrayList(); Integer i = 6; String s; values.add((Object) i); s = (String) values.get(0);
[That is considered very bad Java style. Should have been an ArrayList<Integer>
which can be checked.]
This is seen as a feature in C, but can be dangerous:
unsigned int vals[2] = {0, 1081602048}; double *dp = (double *) vals; printf("%f\n", *dp);
Output:
383.000000
The terms strongly typed
and weakly typed
are used to describe more or less type safety, but seem to be used differently by everybody. Rough definitions:
Strongly typed: every value has a single well-defined type, so you can do lots of checking.
Weakly typed: single values can be treated as different types, so type checking is harder.
C and Java are mostly strongly typed since values are explicitly statically typed, but pointer/reference casting allows the programmer to do weakly-typed things.
Python is strongly typed since the type of each value is tracked by the language (but at runtime since it's dynamically typed).
Some languages (Perl, PHP, JavaScript) will implicitly convert values when necessary (type coercion), so they feel more weakly typed. All display 6
:
print "12"/2
print("12"/2);
console.log("12"/2)
This can be very confusing for programmers.
On the other extreme, Go types are very strongly checked: most operations happen on a specific type, and if you want to mix types in a calculation, you have to explicitly convert in the direction you want.
Things that fail in Go:
var i int = 12 var x float32 = 3.4 fmt.Println(i * x) // mismatched types int and float32 if i { fmt.Println("yes") } // cannot convert int to type bool
Code that would run:
var i int = 12 var x float32 = 3.4 fmt.Println(float32(i) * x) fmt.Println(i * int(x)) if i != 0 { fmt.Println("yes") }
Go code must be extremely precise in the types it's using.
A mutable object/value is one that can be modified after its initial creation. Most OO languages have mutable objects: internal state can be modified by methods or by assigning public properties. e.g. in Java:
var list = new ArrayList<Integer>(); // empty list list.add(6); // now length 1
Or most objects in Python objects can have class attributes modified:
pt = Point2D(3, 5) # 2D point (3,5) pt.x = 4 # now represents (4,5)
Immutable: objects/values that aren't mutable.
e.g. Integer
objects in Java or C#; strings in Java, C#, Python; tuples in Python.
A variable containing an immutable object can still be changed by assigning a new instance:
string direction = "up"; direction = "down";
Important distinction: the variable refers to a new object, not the same object with different contents. Any other references (aliases) to the old object haven't changed.
Immutable objects are useful when you need to know an object won't change.
Examples where immutability might be important:
But mutable objects are often much easier to work with when data structures do need to change. Some examples:
Some languages have ways to enforce mutability.
private
fields and careful class design.const
variables, functions, and arguments.readonly
fields are immutable.Even if your language doesn't have any mutability guarantees, it's still something that should be documented.
from typing import List def append1(lst: List[int], elt: int) -> None: """ Append elt to the lst. Modifies lst in-place. """ lst.append(elt) def append2(lst: List[int], elt: int) -> List[int]: """ Return a copy of lst with elt appended to it. """ return lst + [elt]
In Rust, the concept of mutability is front-and-centre. Variables are declared as mutable or not:
let mut count = 0; count += 1; println!("count = {}", count);
Without the mut
keyword here, compilation fails on the second line: cannot assign twice to immutable variable
.
We will work more with Rust's mutability restrictions later in the course.