Data Structures

Data Structures

We have talked a lot about how control/​represent the logic of our program: if, else, for, while, writing functions, … .

That's important (and a lot of your job for the rest of the semester is to get a feeling for how to use and assemble those pieces), but the way we store information in our code matters too.

Data Structures

In general, data structures are ways to store your data in a program. There can be many tradeoffs when it comes to fast manipulation of data, using less of your computer's memory, ease of working with the data, etc.

But for this course, we'll explore a few data structures (types) that Python gives us, and leave the more general concerns for another course.

Tuples

We have already met tuples as a type used to manage multiple values as a single variable (or function argument, or return value).

We usually can just deal with individual values like we have been doing:

def magnitude(x, y):
    return math.sqrt(x*x + y*y)

x = 3
y = 4
size = magnitude(x, y)

Tuples

But it's sometimes more pleasant to think about a single value with multiple components.

def magnitude(point):
    x, y = point
    return math.sqrt(x*x + y*y)

point = (3, 4)
size = magnitude(point)

That's what Pillow was doing with image sizes, colours, etc. I don't want to have to pass in separate red, green, and blue values: I want to think about a single colour.

Tuples

One notable use: functions always return one thing. If we want to return multiple values, we can return a tuple with as many pieces as we need.

def scale(point, factor):
    x, y = point
    return factor*x, factor*y

point = (3, 4)
doubled = scale(point, 2)

Tuples

Terminology: a tuple of two things is generally called a pair; with three things, a triple; with four things, a 4-tuple.

Tuples

To get the parts out of a tuple, the most beautiful way is probably a multiple assignment:

x, y = point

As long as the number of things on the left side matches the size of the tuple, it will get disassembled into those variables.

Tuples

It's common for tuples to contain different types in each position, creating an ad hoc data structure for a more complex value.

student1 = ("Jane Doe", 4.33)
student2 = ("Bart Simpson", 2.33)
name, gpa = student1

In some sense, this is similar to what classes/​objects are doing: storing all of the aspects of a single concept in one Python value.

Tuples

Tuples can be compared for equality: all elements must be equal for the tuples to be equal.

When compared for less/​greater, the first element that's different determines the result (i.e. lexicographic order).

point1 = (1.0, 2.3)
point2 = (1.0, 2.4)
point3 = (1.0, 2.3)
print(point1 == point2)
print(point1 < point2)  # because 2.3 < 2.4
print(point1 == point3)
False
True
True

Lists

Python's lists are another type that can contain multiple values.

Lists are written with square brackets, can be any length, and typically hold several related values of the same type.

measurements = [1.2, 1.3, 0.9, 1.1, 1.2, 1.0]
fruits = ["apple", "banana", "pear"]

Lists

Elements of a list can be extracted with square brackets and the position of the element (counting from zero), just like extracting characters from strings:

fruits = ["apple", "banana", "pear"]
print(fruits[0])
print(fruits[1])
word = "something"
print(word[0])
print(word[1])
apple
banana
s
o

You can get elements out of a tuple this way too, but it's less of a normal thing to do.

Lists

It's also possible for functions/​methods to return lists. e.g. the string method .split can split a string into parts around a specific delimiter.

date_string = "2024/06/19"
parts = date_string.split("/")
print(parts)
['2024', '06', '19']

Lists

The range function we have been using with for loops doesn't return a list (it returns an object that can generate the values in the range as needed), but it can be converted to a list.

print(range(10))
print(type(range(10)))
print(list(range(10)))
range(0, 10)
<class 'range'>
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

The reason it's not a list: range(1000000000) doesn't take much space in memory. A list with a trillion elements does.

Lists

Unlike other types we have seen, lists can be modified after their initial creation. For example, the .append method adds another element to the end of a list.

fruits = ["orange", "banana", "pear"]
fruits.append("kiwi")
print(fruits)
['orange', 'banana', 'pear', 'kiwi']

Lists

Note the distinction being made here: for a string (or integer or float or tuple), you can change the contents of a variable that holds one of them, but you always do so by creating a brand new instance of that type.

favourite = "apple"
favourite = "mango"
favourite = favourite + "s"

But for a list, the one that's already in memory gets updated without being totally recreated.

fruits = ["orange", "banana", "pear"]
fruits.append("kiwi")

Lists

Python's lists have several useful methods to work with them. For example, .append, or we can easily sort an existing list.

fruits.sort()
print(fruits)
['banana', 'kiwi', 'orange', 'pear']

The .sort method promises \(n\log n\) running time for a list of \(n\) elements (which is the best worst-case running time possible in general for a sorting algorithm).

Lists

We could write a program like this to get a bunch of numbers from the user as input, then sort them:

num = int(input("How many numbers? "))
values = []
for i in range(num):
    x = float(input("Next number: "))
    values.append(x)

print("Here they are in order:")
values.sort()
print(values)

Lists

When run it would look like:

How many numbers? 4
Next number: 18.2
Next number: 4.3
Next number: -1.2
Next number: 100.1
Here they are in order:
[-1.2, 4.3, 18.2, 100.1]

Accumulator Pattern

We have seen code shaped like this many times:

values = []
for i in range(num):
    x = float(input("Next number: "))
    values.append(x)

The pattern I'm referring to: start with an empty thing, and build the result as you loop through your values (in whatever form).

The accumulator variable (i.e. values there) is used to build the final result step-by-step.

Accumulator Pattern

This accumulator pattern can be useful for building lists:

values = []
for i in range(10):
    values.append(random.randint(1, 100))
print(values)

Or for summing values (in total here):

total = 0
pos = 0
while total < 100:
    total = total + values[pos]
    pos = pos + 1
print(total)

Accumulator Pattern

Or building some other collection:

letters = ""
for i in range(26):
    letters = letters + chr(97 + i)
print(letters)

Those three examples will output something like:

[61, 37, 9, 100, 39, 77, 33, 8, 40, 49]
107
abcdefghijklmnopqrstuvwxyz

Accumulator Pattern

I don't care about the accumulator pattern specifically, but we have seen it so many times, it might as well have a name.

There are countless patterns like this in programming. A few have names; most just are things people have seen/​done before so they recognise when they're needed.

Accumulator Pattern

Maybe we could call pos here a counter pattern (even though nobody else does).

total = 0
pos = 0
while total < 100:
    total = total + values[pos]
    pos = pos + 1
print(total)

[Everybody would call pos a counter, though.]

More List Operations

A lot of things that can be done with lists are identical to strings.

They can be joined (concatenated) with a + operator:

cmpt = ["CMPT 120", "CMPT 125"]
macm = ["MACM 101", "MACM 201"]
courses = cmpt + macm
print(courses)
['CMPT 120', 'CMPT 125', 'MACM 101', 'MACM 201']

More List Operations

Elements can be extracted by indexing (like characters in a string):

print(courses[0])
print(courses[1])
CMPT 120
CMPT 125

More List Operations

Unlike strings, we can modify a single element by assigning to it:

courses[3] = "CMPT 201"
print(courses)
['CMPT 120', 'CMPT 125', 'MACM 101', 'CMPT 201']

Also unlike strings, we have seen some methods that modify an existing list: .sort(), .append(x). Also .reverse().

More List Operations

Lists can be repeated with * and a number.

This can be useful to create an initial list of values. e.g. suppose we want to count the number of times 10 different things occur:

counts = [0] * 10
print(counts)
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

then write whatever loop and when event i happens:

counts[i] = counts[i] + 1

More List Operations

There's a method on strings that might be useful around lists: if you have a list of strings, they can be joined with a delimter using the string .join method.

words = ["one", "two", "three", "four"]
added = "+".join(words)
thened = ", then ".join(words)
print(added)
print(thened)
one+two+three+four
one, then two, then three, then four

More List Operations

Converting a string to a list will give you a list of each character:

string = "characters"
chars = list(string)
print(chars)
['c', 'h', 'a', 'r', 'a', 'c', 't', 'e', 'r', 's']

When we could use .join to put it back together if we want.

string2 = "".join(chars)
print(string2)
characters

More List Operations

So far, we have been using the for loop almost entirely with range.

But for can iterate over any collection in Python: range is one, but so is a list.

for colour in ["red", "green", "pink", "chartreuse"]:
    print(colour)
red
green
pink
chartreuse

More List Operations

We could use this with the accumulator pattern to select some elements from a list:

colours = ["black", "yellow", "aquamarine", "rebeccapurple",
           "pink", "lime", "magenta", "palegreen", "red"]
long_names = []
for c in colours:
    if len(c) > 6:
        long_names.append(c)
print(long_names)
['aquamarine', 'rebeccapurple', 'magenta', 'palegreen']

More List Operations

Iterating over a list like that is the nicest way to go through a list's elements, but sometimes you might also need to know where the element came from. You can always index the list as you go through it: it's easy to create a range of the positions in the list.

for i in range(len(colours)):
    c = colours[i]
    if len(c) > 6:
        print(c + " in position " + str(i))

But there's something a little ugly about creating the index (i) just to immediately look back to the list.

More List Operations

There's a built-in function enumerate that produces pairs of the position and the elements in a list.

e.g. for our colours list, it will produce tuples (0, "black"), (1, "yellow"), (2, "aquamarine"), … .

for i, c in enumerate(colours):
    if len(c) > 6:
        print(c + " in position " + str(i))

Slicing & Dicing

Along with indexing a single element, it's possible to take a slice out of a list or string.

words = ["swim", "slide", "reporter", "complex", "battery",
         "raise", "feature", "peanut", "aviation"]
print(words[2])
print(words[2:5])
reporter
['reporter', 'complex', 'battery']

The slice notation [2:5] refers to the sub-list of values 2 to 5-1 == 4. (Like with range, slicing does not include the end value.)

Slicing & Dicing

There's a shortcut for from the start and to the end when slicing: just omit the corresponding slice position:

title = "Slicing & Dicing"
print(title[4:7])  # positions 4 to 6
print(title[:7])   # the start to position 6
print(title[10:])  # position 10 to the end
ing
Slicing
Dicing

Slicing & Dicing

The one before the endpoint behaviour of range and slicing might annoy you, but it makes sense if you're thinking about the length of the collection: range(a, b) and thing[a:b] both contain b - a elements.

Similarly, range(n) and thing[:n] both contain n elements.

Slicing & Dicing

Negative indexes count from the end of a string or list: -1 is the last element, -2 is the second last, etc.

vowels = "aeiouy"
first_primes = [2, 3, 5, 7, 11, 13, 17, 19]
print(vowels[-1])
print(first_primes[-1])
print(vowels[-3:])
print(first_primes[1:-1])
y
19
ouy
[3, 5, 7, 11, 13, 17]

Slicing & Dicing

These imply easy ways to express all but the first and all but the last:

odd_primes = first_primes[1:]
usual_vowels = vowels[:-1]
print(odd_primes)
print(usual_vowels)
[3, 5, 7, 11, 13, 17, 19]
aeiou

These probably come up the most often of all slices: if you're processing a list, it's common to deal with thelist[0] and then have thelist[1:] remaining (or thelist[-1] and then thelist[:-1]).

Mutability

A few slides ago: unlike other types we have seen, lists [and Images] can be modified after their initial creation.

The ability to modify an object after their initial creation is an important property of a type: mutability.

Types that cannot be changed after they are created are immutable.

Mutability

Strings are immutable: we can create many different strings, but each one is unchangable. This code created three different strings:

favourite = "apple"
favourite = "mango"
favourite = favourite + "s"

With a string, we cannot modify in-place, only recreate (e.g. build a whole new string and put it in a variable with a =).

favourite[0] = "t"

That fails with a TypeError.

Mutability

To be clear, what we're asking for here:

favourite = "mango"
favourite = favourite + "s"
  1. Put the string "mango" somewhere in memory.
  2. Make the variable favourite refer to that thing in memory.
  3. Build the string "mangos" somewhere in memory.
  4. Make the variable favourite refer to that.
  5. Throw away the string "mango" because it's not needed anymore.

Mutability

With a list, we can modify the existing instance in a few ways.

fruits = ["orange", "banana", "pear"]
fruits.append("kiwi")
fruits.append("cherry")
fruits[1] = "strawberry"
print(fruits)
fruits.sort()
print(fruits)
['orange', 'strawberry', 'pear', 'kiwi', 'cherry']
['cherry', 'kiwi', 'orange', 'pear', 'strawberry']

Mutability

You can also build a new list by manipulating one that already exists, for example + can be used to join lists (also like strings):

fruits = ["orange", "banana", "pear"]
fruits = fruits + ["kiwi"]

But I'd expect that to be slower (if we measured carefully enough) than changing the one that's already there.

fruits = ["orange", "banana", "pear"]
fruits.append("kiwi")

Mutability

Here's we're aking for:

fruits = ["orange", "banana", "pear"]
fruits.append("kiwi")
  1. Build the list with three elements and put it in memory somewhere.
  2. Make the variable fruits refer to that.
  3. Put one more thing in that list that's already in memory.

It's less work than completely reconstructing the entire list.

Mutability

So mutable types offer more flexibilty in how they are modified.

On the other hand, immutable types are more predictable: our values can only be assigned by a variable assignment statement, not by anything else.

Mutability

For an immutable type, anything we have in a variable is nicely protected: the only thing that can change it is reassigning the variable.

def add_one(n):
    n = n + 1

value = 0
add_one(value)
print(value)
0

The n = … in the function only affects a local variable within the function.

Mutability

On the other hand, a mutable type might be modified if we pass it as a function argument.

def append_one(lst):
    lst.append(1)

values = [3, 4, 5]
append_one(values)
print(values)
[3, 4, 5, 1]

That can make it harder to know if our values are being changed, unless we know/​trust the function.

Mutability

Basically, if there's no way to change a value without an assignment statement (var = …), then that value is immutable.

That includes short-form assignments like +=. These are equivalent, but integers are still immutable.

count = count + 1
count += 1

References

We have seen several situations where a value must be duplicated. Assignment to a new variable:

a = "whatever"
b = a

Or passing an argument to a function:

def f(x):
    return x / 2
a = 2.0
print(f(a))

In both cases, the contents of a get copied.

References

The details of how Python copies values are important.

First: when a variable is created, its contents are stored somewhere in memory and the variable name refers to that. i.e. the variable is a reference to its contents.

References

So, code like this:

greeting = "Hello world"
values = [10, 20, 30]

… creates something I imagine like this:

variables referring to contents

References

If we assign the same object to another variable, Python does not copy the value, it just makes the other variable also refer to it.

greeting = "Hello world"
hello = greeting
variables referring to contents

References

variables referring to contents

That's usually good: it saves memory (because only one copy is stored) and is faster (because it's not necessary to copy the potentially-huge object, just the relatively-small reference).

These variables are aliases of the same object, or that object is aliased.

References

For many data types, we'll never notice aliasing.

count = 1234
old_count = count
count = count + 1

After the first two lines:

variables referring to contents

After the third line, count is reassigned:

variables referring to contents

References

But when combined with a mutable type, aliasing becomes visible.

values = [6, 9, 12]
more_values = values
more_values.append(15)

After the first two lines:

variables referring to contents

After the third line:

variables referring to contents

References

Really?

values = [6, 9, 12]
more_values = values
more_values.append(15)
print(values)
print(more_values)

Yes.

[6, 9, 12, 15]
[6, 9, 12, 15]

References

We saw the same thing happening when passing a mutable type to a function:

def append_one(lst):
    lst.append(1)

values = [3, 4, 5]
append_one(values)
print(values)
[3, 4, 5, 1]

The argument is passed to the function as a reference, so the function can modify mutable arguments.

References

If we're giving a mutable value to a function, we had better really trust it. A mutable value that's given to a function (or any other code) can be changed by that code. If it's well-behaved, that will at least be documented.

def reorder(values):
    """
    Modifies values by moving elements so that…
    """
    ⋮
def reorder(values):
    """
    Returns a rearranged copy of values so that…
    """
    ⋮

Really Copying

If doing a variable assignment doesn't copy the value, just the reference, what if we actually want a real copy of a value?

Sometimes, it's necessary to have an independent copy that we can modify separately.

Really Copying

First observation: any expression using a value forces calculation of the result. That will be a new independent value in memory.

values = [6, 9, 12]
more_values = values + [15]
more_values.append(18)
print(values)
print(more_values)

Here, more_values is unrelated to values.

[6, 9, 12]
[6, 9, 12, 15, 18]

Really Copying

For lists, this implies an easy way to make a real copy: slicing all elements. We know that when slicing like [a:b], leaving out the a means from the start and omitting b means to the end. So, the slice [:] means all elements.

values = [6, 9, 12]
more_values = values[:]
more_values.append(15)
print(values)
print(more_values)
[6, 9, 12]
[6, 9, 12, 15]

Really Copying

The [:] slice is an idiom in Python that's both common and very confusing until you recognize it.

more_values = values[:]

Like many such idioms, it takes a few words of explanation, but once you know the pattern, it's obvious.

Really Copying

Many classes have a .copy() method that know how to copy instances of that class (efficiently, one assumes).

e.g. an Image's .copy() method can be used if you want to keep an instance of the original image while drawing on another.

from PIL import Image, ImageDraw
original = Image.open("Vancouver_Skyline_and_Mountains.jpg")
for_drawing = original.copy()
draw = ImageDraw.Draw(for_drawing)
⋮
for_drawing.save("result.png")
some_other_operation(original)

Really Copying

Or, the copy module contains a copy function that will make a copy of any Python object.

import copy
value = AnyClassYouHave(1, 2, 3)
another = copy.copy(value)

Sets

The lists we have been using have a few notable properties:

  • They are sequences of values (i.e. we can do for x in some_list).
  • They are mutable.
  • It's fast to add an element (.append()).
  • Elements have an order (there's an element 0, 1, 2, …).

Sets

That's all great, but one notable problem:

  • Finding an element is slow: running time \(n\) because we have to scan the whole list to (possibly) find it (later: linear search).

Imagine scanning some huge data set for duplicates and needing to quickly decide we have seen that before.

Sets

There are a few ways to create a data structure that is faster to search than a list (later: binary search).

Built into Python is the set data type. Like lists, a set is a collection of values, but:

  • The elements aren't in any (fixed or well defined) order.
  • It cannot hold duplicates. A value is either in the set or not: it cannot be there twice.

i.e. they're like sets from math.

Sets

Sets can be created with curly braces or the set function.

colours = {"red", "purple", "green", "blue", "black"}
print(colours)

The first time I ran the program:

{'purple', 'red', 'green', 'black', 'blue'}

The next time:

{'blue', 'green', 'purple', 'red', 'black'}

Sets

So it doesn't make any sense to index out of a set because there's not well-defined result the language could give: colours[0] gives a TypeError.

But we can iterate through the element: they might come in any order, but we will get them all.

for c in colours:
    print(c)

Sets

We can't .append to a list: the word implies an at the end behaviour that sets can't promise. We do get an .add method that does the analogous thing.

colours = {"red", "purple", "green", "blue", "black"}
print(colours)
colours.add("yellow")
print(colours)
colours.add("purple")  # already in the set, so no effect
print(colours)
{'blue', 'green', 'purple', 'red', 'black'}
{'blue', 'green', 'purple', 'red', 'yellow', 'black'}
{'blue', 'green', 'purple', 'red', 'yellow', 'black'}

Sets

So why do we want sets, really? It seems like they are like lists but don't do as much.

First reason: sometimes, having a no duplicates data structure is really convenient.

Sets

e.g. suppose we want to detect repeated letters in a string (just a boolean True/False result: repeats or not).

We can convert strings to a set of characters with set(). Observation: if there are repeated characters, there will be fewer elements in the set than in the original string.

>>> w = "repeated"  # three 'e's
>>> print(len(w))
8
>>> print(set(w))   # all 3 'e' become one in the set
{'t', 'd', 'e', 'p', 'a', 'r'}
>>> print(len(set(w)))
6

Sets

That implies a very easy way to detect repeated characters:

def has_duplicate_chars(word):
    chars_set = set(word)
    return len(word) != len(chars_set)

print(has_duplicate_chars("repeated"))
print(has_duplicate_chars("uncopyrightable"))
True
False

In general, converting to a set can be a very simple discard duplicates technique.

Sets

Also, sometimes you just have an unordered collection and a set is the most honest way to think about it.

e.g. for assignment 3, the hands of cards could be sets.

… except the dealer's hand which has a definite first element (the hole card), so probably stick with lists for the assignment.

Sets

But I started talking about sets by saying finding elements in a list is slow. Is it faster for a set?

Finding an element in a list: running time \(n\) because we have to search the whole collection.

How does searching in a set work?

Sets

There's a Python operator in that will tell us whether or not a specific element is in a collection.

>>> some_list = [2, 4, 6, 8]
>>> some_set = set(some_list)
>>> some_set
{8, 2, 4, 6}
>>> 4 in some_list
True
>>> 5 in some_set
False

Note: this is very different that the way the word in is used with a for loop: there we are iterating through the elements, not checking membership.

Sets

Finding an element in a set: we are promised constant running time (\(O(1)\)) for in (in non-pathological cases: details will have to wait for another course).

That should be much faster for large lists/​sets. Is it really?

Sets

I'm going to check with the timeit module.

The idea: we hand it some code to set up the scenario, and the code that we want to time. It runs the setup code once, and the test code a bunch of times and reports the total elapsed time for the test.

Sets

With a collections of 100 values, searching the list took 207 ns on average. For a set, 9 ns on average.

For 1000 values, 2080 ns for the list and 9 ns for the set.

It certainly looks like the running times are proportional to \(n\) for the list and constant for the set.

Sets

Some things to note:

  • Running times matter most for large \(n\).
  • Nanoseconds are short. Computers are fast.
  • Timing anything that takes a few nanoseconds is going to be noisy: how much will the test be affected if I get an email at the moment I'm running it?

Sets

It's also possible to time code with %timeit magic in some Python shells (but sadly, not IDLE). This is in the iPython shell:

In [1]: import random
In [2]: n = 1000
In [3]: def random_list():
   ...:     res = []
   ...:     for i in range(n):
   ...:         res.append(random.randint(0, n*200))
   ...:     return res
   ...: 
In [4]: rand_list = random_list()
In [5]: rand_set = set(rand_list)
In [6]: %timeit 5 in rand_list
2.06 µs ± 11.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [7]: %timeit 5 in rand_set
11.5 ns ± 0.0455 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)

Sets

Something we can infer from the .add method: sets are mutable.

There are some other methods to modify a set: .remove(x) to remove a specific element, .pop() to extract an arbitrary element and remove it.

colours = {"red", "purple", "green", "blue", "black"}
colours.remove("red")
c = colours.pop()
print(c)
print(colours)
blue
{'green', 'purple', 'black'}

Sets

As we have seen, there are occasions where having a mutable data structure isn't desirable: if you want to be sure it doesn't change.

There is a related type frozenset that is like set but immutable: no .add, .pop, etc.

colours = frozenset({"red", "purple", "green", "blue"})

We probably won't use it, but the lesson: programmers care enough about mutable/​immutable types that it was added to the language.

Sets

Speaking of mutability…

Because of they way they are implemented, sets can only contain immutable elements. The data structure needs to be able to guarantee the contents of its elements don't change while it's holding them.

>>> set_of_tuples = {(1,2,3), (4,5,6)}
>>> set_of_strings = {"abc", "def"}
>>> set_of_lists = {[1,2,3], [4,5,6]}
TypeError: ...

Sets

If lists were allowed in sets, they could be changed after the set was built.

l1 = [1, 2, 3]
l2 = [4, 5, 6]
set_of_lists = {l1, l2}
l1.append(4)
l2.append(7)

Sets

The way sets can be sure their elements aren't modified: only accept immutable elements.

Yet another reason mutability matters: sometime we have to have immutable types if we want to use them in particular ways.

Handling Errors

As we discussed in the Debugging slides, there are several categories of problems you might have in your programs.

Syntax errors will stop the program from running at all.

Semantic errors will let the program run but do the wrong thing: these are necessarily the programmer's problem.

Handling Errors

But in between are runtime error or exceptions: these will stop the program as it's running because some impossible situation occurred.

Some examples:

import do_magic_stuff  # causes a ModuleNotFoundError
s = "abcd"
s[0] = "A"             # causes a TypeError
c = s[6]               # causes an IndexError
n = int("ten")         # causes a ValueError
x = 10 ** 10000        # causes a ValueError
x = something_else     # causes a NameError
x = 10 / 0             # causes a ZeroDivisionError

Handling Errors

Exceptions are somehow categorized: TypeErrors are about a problem relating to the type of values, IndexError is a problem with indexing, etc.

Python provides a way to catch exceptions and handle them gracefully. You need to know where the exception might occur and which exception will be. Then…

Handling Errors

A try/except block will let you decide how to deal with an exception (instead of the program ending with an error message).

numstr = input("How many? ")
try:
    num = int(numstr)
except ValueError:
    print("That wasn't a number")
    num = 0
print(num)

Handling Errors

When run, that code will look like one of these:

How many? 123
123
How many? lots
That wasn't a number
0

Handling Errors

The behaviour of a try/except:

  • The code in the try runs.
  • If there was no execption, the except code is skipped.
  • If it throws an exception catching the one on the except line, the try stops and except runs.
  • If it throws a different exception, it stops the program as usual.

Handling Errors

We can finally enforce good user input if we want to. The pattern will be something like:

from PIL import Image
img = None
while img == None:
    try:
        filename = input("Image filename: ")
        img = Image.open(filename)
        process_image(img)
    except FileNotFoundError:
        print("Sorry the image doesn't exist.")

Handling Errors

It's considered good style to minimize the amount of code in the try block: you don't want to catch exceptions that aren't the one you were expecting and/or don't know how to deal with.

Or if you can avoid the excption altogether, that's probably nicer.

Handling Errors

e.g. suppose we want to index characters from a string, but return spaces outside of it's real indices. I'd prefer this:

def index(string, position):
    if 0 <= position < len(string):
        return string[position]
    else:
        return " "

to this, but that's a stylistic choice:

def index(string, position):
    try:
        return string[position]
    except IndexError:
        return " "

Handling Errors

You can use a try/except any time something might cause a runtime error (and you know how to clean up after it).

It's probably better to avoid the error altogether if possible, but that's not always possible/​practical.