The data your program is using is stored in the computer's memory. Each program has a certain amount of memory allocated to it, which can be expanded or contracted as necessary.
This memory is generally divided into three parts:
static
.malloc
in C; new
in C++, Java, C#; all objects in Python, Ruby.The usual diagram of what's conceptually happening:
Stack and heap grow and shrink as necessary.
The stack contains info on function calls: arguments and locally-defined variables for each instance. Values are popped when the function returns.
In dynamic languages, usually everything is on the heap (but there is a call stack for functions).
As a programmer, how do you control the memory you're using? What is stored and when can it be thrown away? The goal: make sure we only store the data we need, not keep around old stuff: a memory leak.
How we do that depends on the language.
Stack memory is fairly obvious: when the function returns, pop its stuff off the stack. Every language (except maybe assembly) takes care of this.
But the stack is often only a small amount of the memory being used: in most OO languages, the stack variables are just object references(/pointers) and actual object contents are on the heap.
Heap memory is harder, because it's hard to know when a program is no longer using a value.
Values are still useful as long as there is a pointer/reference to them. How do we know when the last reference is gone?
In C, keeping track of allocated heap memory is the programmer's problem. What is allocated must be freed by somebody, exactly once.
/* array of 100 int on the heap: */ int *arr = (int *)malloc(100*sizeof(int)); arr[17] = 10; printf("%i\n", arr[17]); free(arr); /* free() must be called to not leak. */
In C++, objects can be on the stack: those are destroyed when the function returns.
void stack_object_example() { Point2D pt1 = Point2D(1, 2); cout << pt1 << '\n'; }
Objects created with new
are on the heap and must be delete
d.
void heap_object_example() { Point2D *pt2 = new Point2D(3, 4); cout << *pt2 << '\n'; delete(pt2); }
When pointers are passed around, it can be unclear who owns
them and is responsible for deleting.
Point2D *create_object_pointer() { Point2D *pt = new Point2D(5, 6); return pt; }
Some code far away must delete()
.
Point2D *pt3 = create_object_pointer(); cout << *pt3 << '\n'; delete(pt3);
That's extremely error-prone. In C++11, memory management was modernized with smart pointers like unique_ptr<T>
that is a pointer available in exactly one scope.
unique_ptr<Point2D> create_unique_ptr() { auto pt = make_unique<Point2D>(7, 8); return pt; // implicitly give up ownership by returning }
When the unique_ptr
is destroyed, it deletes its contents.
auto pt4 = create_unique_ptr(); cout << *pt4 << '\n';
No need to delete(pt4)
: it happens automatically. Ownership can be explicitly given away (moved) if other code needs the object.
auto pt5 = create_unique_ptr(); function_that_uses_a_point2d(move(pt5)); // cout << *pt5 << '\n';
Calling move()
gives away ownership of the pointer, giving ownership to the function (so it's deleted when the function returns).
After that, using *pt5
(outside the function) would fail: this code gave away ownership of the object so it's not ours to work with.
The result: very little work on the programmer's part, but it's very hard to have a memory leak.
You really should be using smart pointers in modern C++.
See also shared_ptr
that has reference counting semantics (more later).
C and C++ (and old Objective-C) are the only modern languages where memory is managed manually, and the trend is definitely away from doing so.
It's just too hard to free
/delete
perfectly 100% of the time. If you don't, your program will slowly use more-and-more memory over time.
What are the alternatives?
It would be nice if the language would handle the freeing of memory for us.
Basic observation: if there are no references left to an object, it can be freed.
But, having references doesn't mean an object will actually be used again. The programmer should still explicitly delete (release
, del
, etc) when necessary, especially large objects with long-lived references.
A garbage collector is part of a language's runtime environment that looks for objects on the heap that can no longer be accessed (garbage) an frees them (after calling their finalizers, in languages that have the concept).
Garbage collection…
There are several garbage collection algorithms.
The programmer needs to know that the language has garbage collection. An implementation of the language might choose any algorithm; different implementations of the same language might have different strategies.
Reference counting garbage collection keeps track of the number of references to each object. When the number decreases to zero, delete. Can't handle cyclic data structures; requires space and time to maintain the counters.
Tracing garbage collection looks for which objects are reachable from references available in the program: everything else is garbage. There are many strategies to do this quickly and without stopping execution while it happens.
If using only C++ smart unique_ptr
s (and friends), there isn't any need for garbage collection at runtime.
Assuming we keep the unique_ptr
on the stack, it will be deleted as appropriate. When the unique_ptr
is deleted, it will automatically delete the object it refers to: the pointer was unique, so reference counting
is easy and can be done at compile-time.
Rust does the same and ensures memory safety by having explicit ownership of memory. (More later.)
There is no garbage collector, but the compiler can determine when a value is no longer needed (when there are no more references to it) and free the heap memory.
Explicitly-tracked ownership presents another option that could be considered a kind of automatic memory management
. So, we have
programmer-time?),