Languages: Memory

The data your program is using is stored in the computer's memory. Each program has a certain amount of memory allocated to it, which can be expanded or contracted as necessary.

Languages: Memory

This memory is generally divided into three parts:

Static: memory known to be needed at compile time. e.g. strings in in the program, globals, C static.
Stack: function parameters and function-local variables (or similar for other variables scopes).
Heap: dynamically-allocated objects. e.g. malloc in C; new in C++, Java, C#; all objects in Python, Ruby.

Languages: Memory

The usual diagram of what's conceptually happening:

Stack and heap grow and shrink as necessary.

Languages: Memory

The stack contains info on function calls: arguments and locally-defined variables for each instance. Values are popped when the function returns.

In dynamic languages, usually everything is on the heap (but there is a call stack for functions).

Managing Memory

As a programmer, how do you control the memory you're using? What is stored and when can it be thrown away? The goal: make sure we only store the data we need, not keep around old stuff: a memory leak.

How we do that depends on the language.

Managing Memory

Stack memory is fairly obvious: when the function returns, pop its stuff off the stack. Every language (except maybe assembly) takes care of this.

But the stack is often only a small amount of the memory being used: in most OO languages, the stack variables are just object references(/pointers) and actual object contents are on the heap.

Managing Memory

Heap memory is harder, because it's hard to know when a program is no longer using a value.

Values are still useful as long as there is a pointer/reference to them. How do we know when the last reference is gone?

Manual Memory Mgmt.

In C, keeping track of allocated heap memory is the programmer's problem. What is allocated must be freed by somebody, exactly once.

/* array of 100 int on the heap: */
int *arr = (int *)malloc(100*sizeof(int));
arr[17] = 10;
printf("%i\n", arr[17]);
free(arr);   /* free() must be called to not leak. */

Manual Memory Mgmt.

In C++, objects can be on the stack: those are destroyed when the function returns.

void stack_object_example() {
    Point2D pt1 = Point2D(1, 2);
    cout << pt1 << '\n';
}

Objects created with new are on the heap and must be deleted.

void heap_object_example() {
    Point2D *pt2 = new Point2D(3, 4);
    cout << *pt2 << '\n';
    delete(pt2);
}

Manual Memory Mgmt.

When pointers are passed around, it can be unclear who owns them and is responsible for deleting.

Point2D *create_object_pointer() {
    Point2D *pt = new Point2D(5, 6);
    return pt;
}

Some code far away must delete().

Point2D *pt3 = create_object_pointer();
cout << *pt3 << '\n';
delete(pt3);

Manual Memory Mgmt.

That's extremely error-prone. In C++11, memory management was modernized with smart pointers like unique_ptr<T> that is a pointer available in exactly one scope.

unique_ptr<Point2D> create_unique_ptr() {
    auto pt = make_unique<Point2D>(7, 8);
    return pt; // implicitly give up ownership by returning
}

Manual Memory Mgmt.

When the unique_ptr is destroyed, it deletes its contents.

auto pt4 = create_unique_ptr();
cout << *pt4 << '\n';

No need to delete(pt4): it happens automatically. Ownership can be explicitly given away (moved) if other code needs the object.

Manual Memory Mgmt.

auto pt5 = create_unique_ptr();
function_that_uses_a_point2d(move(pt5));
// cout << *pt5 << '\n';

Calling move() gives away ownership of the pointer, giving ownership to the function (so it's deleted when the function returns).

After that, using *pt5 (outside the function) would fail: this code gave away ownership of the object so it's not ours to work with.

Manual Memory Mgmt.

The result: very little work on the programmer's part, but it's very hard to have a memory leak.

You really should be using smart pointers in modern C++.

See also shared_ptr that has reference counting semantics (more later).

Manual Memory Mgmt.

C and C++ (and old Objective-C) are the only modern languages where memory is managed manually, and the trend is definitely away from doing so.

It's just too hard to free/delete perfectly 100% of the time. If you don't, your program will slowly use more-and-more memory over time.

What are the alternatives?

Garbage Collection

It would be nice if the language would handle the freeing of memory for us.

Basic observation: if there are no references left to an object, it can be freed.

But, having references doesn't mean an object will actually be used again. The programmer should still explicitly delete (release, del, etc) when necessary, especially large objects with long-lived references.

Garbage Collection

A garbage collector is part of a language's runtime environment that looks for objects on the heap that can no longer be accessed (garbage) an frees them (after calling their finalizers, in languages that have the concept).

Garbage Collection

Garbage collection…

avoids memory leaks (but doesn't completely prevent them if the programmer keeps useless references around);
eliminates work the programmer has to do to manage memory;
happens at run-time, so causes some overhead.

Garbage Collection

There are several garbage collection algorithms.

The programmer needs to know that the language has garbage collection. An implementation of the language might choose any algorithm; different implementations of the same language might have different strategies.

Garbage Collection

Reference counting garbage collection keeps track of the number of references to each object. When the number decreases to zero, delete. Can't handle cyclic data structures; requires space and time to maintain the counters.

Tracing garbage collection looks for which objects are reachable from references available in the program: everything else is garbage. There are many strategies to do this quickly and without stopping execution while it happens.

Tracking Ownership

If using only C++ smart unique_ptrs (and friends), there isn't any need for garbage collection at runtime.

Assuming we keep the unique_ptr on the stack, it will be deleted when appropriate. When the unique_ptr is deleted, it will automatically delete the object it refers to: the pointer was unique, so reference counting is easy and can be done at compile-time.

Tracking Ownership

Rust does the same and ensures memory safety by having explicit ownership of memory. (More later.)

There is no garbage collector, but the compiler can determine when a value is no longer needed (when there are no more references to it) and free the heap memory.

Tracking Ownership

Explicitly-tracked ownership presents another option that could be considered a kind of automatic memory management. So, we have

manual memory management (at programmer-time?),
run-time garbage collection,
compile-time ownership tracking.