This starts the second major topic of the course: what are some important aspects of the design of programming languages? How does that affect how/when a language will/could/should be used?
[But not: how can we design/create a new programming language?]
Aside: there will be code examples in many languages in the middle of the course. You will know some of the languages, and not others. That's okay.
I'll do my best to explain the key points so you can figure out what's happening.
By now you have worked with (at least) one functional language and (I assume) a couple of imperative languages. There are hundreds of others, some very weird.
Our next major topic: what are the important differences between them?
The question I'll be thinking of: if you need to choose a programming language for a project, how can you make that decision?
To answer that, you need to know what some of the features of the candidate languages are, and what they mean for the programs you write. What are the tradeoffs?
Some important things we won't talk much about:
Important things I will have in mind:
Balancing those factors depends on the problem at hand. The last is easy to quantify, and therefore tempting to focus too much attention on.
Programming languages are often classified by their paradigm: the overall way logic is expressed in the language.
As discussed earlier, the broadest categories are:
We could include as paradigms
things like:
Paradigm
is also applied to the way languages store/manipulate data.
Languages are also often categorized as high/low level.
Higher-level: more like what people want to write/understand. (e.g. Python, Haskell)
Lower-level: more like what the computer understands/executes. (e.g. assembly, C)
Depending on the programming language (and its paradigm), there will be different way to put code together to express logic.
In imperative languages, the smallest unit is usually a statement: one step
.
In declarative languages, it's some kind of description of a (partial) result. In Haskell, an expression. In SQL, a query.
Most languages (especially imperative langugaes) have some control flow structures/statements that are used to express conditional code and repetition.
Conditionals:
if
-elseif
-else
switch
-case
Loops:
while
do
-while
for
for
-each (over a collection)Also, break
, continue
.
In most languages, these are combined into functions (sometimes procedures).
In object-oriented languages, classes contain both variables (properties or attributes) and functions (methods).
Functions/classes are then often organized into modules or packages or something. These are imported/loaded into other code when they are needed.
Editorial content: figuring out how directories + files + modules/packages + importing them go together in a new language is always a serious pain.
A compiler is a program that translates a program from one representation to another. Often, this is source code (the code the programmer types) to machine code.
Machine code: instructions for the processor. The target could be something other than machine code.
Assembly language: more human-readable but almost one-to-one with machine code.
The word compiler
usually refers to translation from a higher- to lower-level representation.
Converting between higher-level representations is source-to-source translation or transpiling.
This is common when converting to JavaScript for execution in a web browser, e.g. CoffeeScript, TypeScript, Dart.
Maybe also another language to C, to take advantage of the C compilers' optimization and portability. (e.g. Nim, Cython)
Often languages are compiled to instructions for a virtual machine, a platform independent runtime environment like the Java Virtual Machine (JVM), .NET Common Language Runtime (CLR), etc.
Once compiled for the VM, it can be interpreted by the VM implementation. Compare: machine code which is sent directly to the processor.
An interpreter is a piece of software that takes a program in some language (other than machine code) and executes it.
This could be original source written by the programmer, but that's very unusual. More likely, some intermediate form for a virtual machine.
The speed of execution will depend on the implementation and form of input.
Most programs that appear to be interpreters are actually both a compiler and interpreter.
The source code is first compiled for a VM in memory, and that is then executed.
Whether the intermediate form is stored on disk or not doesn't really matter: compilation still happened.
i.e. There is no sense in which Java and C# are compiled, but Haskell, JavaScript, Python, or Ruby are not.
When compiling to a virtual machine, the result is generally called bytecode.
Bytecode is typically a very low-level representation of a program's logic designed to be interpreted quickly. It looks more like assembly/machine code than a high-level language.
In other words: a virtual machine is an interpreter for bytecode.
e.g. a Java .class
file is bytecode for the Java Virtual Machine and can be disassembled (also on godbolt.org).
e.g. Python bytecode can be inspected as well (also on godbolt.org).
Bytecode interpreters (virtual machines) are generally slower than direct execution of machine code.
Any interpreter has some overhead when converting bytecode to actual machine instructions. This must be done throughout the execution, so there's some overhead and we expect slower execution.
Compilers can do many things to optimize the code they generate. They generally optimize for speed, but maybe also for size/memory.
Optimization can be done when producing either machine code or bytecode. Both can be made better by analysis of the program.
Optimization is often off by default: check your compiler's command line options.
A compiler can do many things to optimize. e.g.
C compilers have historically received more attention, so have more optimizations implemented.
We saw GHC do some optimizations on Haskell code, and it can do others.
Compiler optimization is only limited by the static program analysis that can be done.
The basic promise of the compiler: it will produce the same result as what you specified, not that it will do exactly what your code says.
An optimizer has to unravel the details of your code to determine the true
result you've asked for. Sometimes, it might be best to think of your code as a description of the results you want, not as a strictly imperative description of behaviour.
How good can optimizers be? Compare gcc
and clang
on summing integers with -O2
optimization.
Conclusion: compiler writers think of your code as a specification of the result you want, not necessarily as a literal sequence of steps to follow.
Interpreting (byte)code at run-time always comes with a speed penalty. Even the most clever bytecode needs to be somehow processed while it's running: that takes instructions and therefore time.
To overcome this, some interpreters/VMs include a just-in-time compiler.
A Just-In-Time compiler starts with bytecode and either…
… during execution, compiles that code to machine code and stores it in memory.
Then, when that code needs to execute (again), it can use the machine code version.
Should be faster: often much faster. The cost: some memory and compilation work at run-time.
Languages traditionally thought of as slow
are getting a lot of JIT-related attention. e.g. PyPy for Python, V8 and SpiderMonkey for JavaScript.
JITs have some advantages over Ahead-Of-Time (AOT) compilation. JITs can…
dynamic binding, later).
But, JITs have to do their work during program execution.
That will slow things down (at least momentarily). Performance may be less predictable.
Any programming language* could be compiled to machine code, or to a VM, or interpreted directly. Don't confuse the usual way a language gets treated with a rule about how it must be executed.
Compiling/interpreting/JIT-compiling are questions about the tools, not the language. Some examples…
To run C and/or C++ code, we could:
Java can be executed by:
Python:
Different implementations of a specific programming language can vary widely in performance. There are language design choices that can affect performance (more later), but tools have more of an effect than most people realize.
I have spent way too long on an example to illustrate this: Mandelbrot Set Language Shootout.
The code does a numeric calculation in many languages and compares the runtimes for many implementation.
Some lessons from that comparison:
fastwith enough compiler cleverness.
Don't read too much into one benchmark. This is a tight loop with floating-point calculations: no strings, no integers, no arrays, etc.
Compare the PyPy benchmarks which is a much more comprehensive benchmark suite.