Rust: Memory Management

Rust: Memory Management

We have seen that Rust does a lot to keep us safe in how we manage/​access memory/​types.

These things are baked into the language design, and enforced by the compiler.

Rust: Memory Management

  • Ownership: every value is owned by exactly one variable.
  • Safe borrowing with references: either multiple immutable borrows, or a unique mutable borrow.
  • References cannot outlive their values.
  • The Option type to avoid the need for null pointers.
  • Traits so we can guarantee specific operations will be available.

Rust: Memory Management

At the same time, Rust almost always gives zero-cost abstractions. Almost none of these safety assurancecs cost us run-time performance. They are generally ensured at compile-time, and therefore there's no extra work to do at run-time.

Compare garbage collection, which inherently happens while the program runs.

Rust: Memory Management

Of that list, the only thing I can see that has any runtime cost is Option: it has to store one extra value as a flag indicating I'm a Some() vs I'm a None.

use core::mem::size_of;
type Point3D = (f64, f64, f64);
println!("{}", size_of::<Point3D>());
println!("{}", size_of::<Option<Point3D>>());
24
32

Static Binding

Since we're talking about things that should be zero-cost, what about generic functions?

fn is_larger<T: Ord>(a: T, b: T) -> bool {
    a > b
}

Is there dynamic dispatch happening here, since the actual type of T isn't known? (i.e. can the compiler statically-bind an implementation of >?)

Static Binding

Rust does statically bind this code because it does monomorphization while compiling.

What that means: a version of this function is compiled for each T that you use it with.

Static Binding

When we did this:

fn is_larger<T: Ord>(a: T, b: T) -> bool {
    a > b
}
println!("{}", is_larger(2, 3));
println!("{}", is_larger("two", "one"));

The compiler actually created single-type versions is_larger<i32> and is_larger<&str>.

Static Binding

So, we might expect a small increase in the size of our compiled output (since there are two implementation of the same function compiled), but all of the code can be statically-bound.

Static Binding

It is possible to have dynamically-bound code in Rust. We can create a Vec of things that implement the Ord trait, but there's more to say before we can.

In particular, the dyn keyword will be there wherever dynamic binding is happening: it won't be a secret.

Data Structures

Let's make a binary tree in Rust.

We will have nodes with left and right children. Either child could be either another node, or missing. In Rust, that would be:

struct Node<T> {
    value: T,
    left: Option<Node<T>>,
    right: Option<Node<T>>,
}

Data Structures

This fails to compile:

recursive type `tree::Node` has infinite size…
insert some indirection (e.g., a `Box`, `Rc`, or `&`) to make `tree::Node` representable

All of the values we have been using in Rust have a size known at compile time. Rust demands this so it can know how large a function's stack frame is, etc.

Data Structures

One of the compiler's suggestions was & (since the size of a reference is known). Let's try:

struct Node<T> {
    value: T,
    left: Option<&Node<T>>,
    right: Option<&Node<T>>,
}

Nope.

missing lifetime specifier…
consider adding an explicit lifetime bound…
so that the reference type … does not outlive the data it points at

Data Structures

The problem now is the Rust rule for references: references must always be valid.

Any reference we hand to this data structure might be stored for a long time. Nothing guarantees that the value behind the reference will live that long.

Data Structures

All of the values we have worked with so far in Rust have been on the stack: local variables to a function that disappear when the function exits. The Rust ownership rules were easy to apply in that case, but what if we actually need a long-lived reference?

How can we put things on the heap?

The Heap and Box

Rust has a collection of smart pointer types that can be used to contain other value, and safely manage them on the heap.

The first one we'll need: Box<T>. A Box can hold another value, takes ownership of that value, and makes sure the value lives as long as the Box exists.

The Heap and Box

A Box is effectively a unique reference: it's not Copy, and is Clone but clones the entire contents of the box.

When a Box gets dropped, its contents will be dropped as well (because it owns the contents).

The Heap and Box

We can make a Box with Box::new: it creates a new box and moves ownership of the value into the box (or copies).

let f1 = 1.234;
let b1 = Box::new(f1);
let b2 = Box::new(1.234);

These are f1: f64 and b1,b2: Box<f64>.

The Heap and Box

Boxes are auto-dereferenced, so we can usually use them like their contained value.

println!("{:?} {:?}", b1, b2); // 1.234 1.234
println!("{:?}", b1 == b2); // true

Or we can dereference with * (same as references):

let f2: f64 = *b2;

The Heap and Box

We can return to the tree data structure now: the left and right children will either be None or another Node in a Box (and thus on the heap):

#[derive(Debug, Clone)]
struct Node<T> {
    value: T,
    left: Option<Box<Node<T>>>,
    right: Option<Box<Node<T>>>,
}

The Heap and Box

Let's implement some methods:

impl<T> Node<T> {
    fn new(value: T) -> Node<T> {
        Node {
            value,
            left: None,
            right: None,
        }
    }
    fn set_left(&mut self, value: T) {
        self.left = Some(Box::new(Node::new(value)));
    }
    fn set_right(&mut self, value: T) {
        self.right = Some(Box::new(Node::new(value)));
    }
}

The Heap and Box

And we have a memory safe data structure we can use.

let mut n = Node::new(10);
n.set_left(5);
n.set_right(15);
println!("{:?}", n);
Node { value: 10, left: Some(Node { value: 5, left: None, right: None }), right: Some(Node { value: 15, left: None, right: None }) }

More Smart Pointers

The Box is doing two things for us: letting us put values on the heap, and giving us a safe reference that owns it, like the C++ unique_ptr.

There are other smart pointer-like things we might need to do, and other Rust data structures that let us.

More Smart Pointers

Because Box is unique, it can hold a mutable value and let you mutate it. But what about multiple immutable references?

The Rc<T> type is a reference-counted container for immutable values.

More Smart Pointers

An Rc can be safely cloned and is guaranteed to live as long as the contents are accessible. It owns its contents, and drops them when zero references remain.

use std::rc::Rc;
let mut v = Vec::from([1, 2]);
v.push(3); // can get "&mut v" here
let r1 = Rc::new(v);
let r2 = r1.clone();
//v.push(4); // fails: borrow of moved value
//r1.push(4); // fails: cannot borrow as mutable
println!("{:?}", r1);
println!("{:?}", r2);

More Smart Pointers

Using Rc is probably most useful when creating/​using a data structure and want to work with it in multiple places.

The value can be created (with mut if needed) and then moved into an Rc. The Rc can be cloned and shared in as many places as you need it.

More Smart Pointers

But Rc is not thread-safe: it does not implement Send or Sync.

For that, there's a separate Arc<T>: atomic reference counted. It does the locking necessary to safely be used across multiple threads. i.e. it is Clone, Send, and Sync

More Smart Pointers

let vec = Arc::new(Vec::from([1, 2, 3, 4, 5, 6]));
let vec1 = vec.clone();
let vec2 = vec.clone();
let jh1 = thread::spawn(move || {
    let sum = vec1.iter().fold(0, |a, b| a + b);
    println!("I added and got {}", sum);
});
let jh2 = thread::spawn(move || {
    let sum = vec2.iter().fold(1, |a, b| a * b);
    println!("I multiplied and got {}", sum);
});
println!("I'm just hanging out with {:?}", vec);
jh1.join().unwrap();
jh2.join().unwrap();

There's only one copy of this vector in memory.

More Smart Pointers

The output will be (with lines permuted arbitrarily):

I'm just hanging out with [1, 2, 3, 4, 5, 6]
I added and got 21
I multiplied and got 720

More Smart Pointers

Both Rc and Arc ensure immutable contents because there's no way to get a mutable reference to the contents. i.e. they do not implement AsMut.

But we can get immutable references to the contents (because they implement AsRef).

The Box does implement AsMut and whenever we did mutable things inside a Box, there was an implicit dereference using .as_mut().

More Smart Pointers

What if we need shared mutability? We can't have multiple mutable references, but there are cases where it's necessary.

It would be nice to be able to have a shared mutable data structure, but keep Rust's safety.

More Smart Pointers

We have RefCell: a sharable Box-like thing where you can get a mutable reference, but safe borrowing is checked at runtime. And Mutex that is like RefCell but thread-safe.

A lot of things kind of mean reference in Rust…

More Smart Pointers

TypeSharable?Mutable?Thread Safe?
&yes *nono
&mutno *yesno
Boxnoyesno
Rcyesnono
Arcyesnoyes
RefCellyes **yesno
Mutexyes, in Arcyesyes

Dynamic Values

With Box, we can revisit dynamic values in Rust. We can use trait objects that are a value of any type that implements a specific trait.

Trait objects (always?) need to be wrapped in Box or similar: values must have a size known at compile time, and arbitrary thing implementing trait doesn't.

Dynamic Values

Let's create a trait and implementation of it:

trait CanSpeak {
    fn speak(&self) -> ();
}

struct Speaker {}
impl CanSpeak for Speaker {
    fn speak(&self) -> () {
        println!("hello");
    }
}

Dynamic Values

Remember: we can use it in a static/​mono­morphi­zation way:

fn speak_twice_static<S: CanSpeak>(s: &S) {
    s.speak();
    s.speak();
}
let s = Speaker {};
s.speak();
speak_twice_static(&s);

Dynamic Values

But we can also create variables and function arguments that take a dynamically-typed trait object:

fn speak_twice(s: &Box<dyn CanSpeak>) {
    s.speak();
    s.speak();
}
let s: Box<dyn CanSpeak> = Box::new(Speaker {});
s.speak();
speak_twice(&s);

The dyn keyword indicates anything that implements this trait. These operations are then dynamically-bound.

Dynamic Values

Dynamic types are more useful to create a collection of values:

use std::fmt::Display;
let mut displayable: Vec<Box<dyn Display>> = Vec::new();
displayable.push(Box::new(5));
displayable.push(Box::new("hello"));
displayable.push(Box::new(GeoPoint::default()));

Again: the Box wrapper is needed so the size of each element is known by the compiler.

Dynamic Values

We can then use these values, knowing that they implement Display, and usages of it will be dynamically-bound.

for value in displayable {
    println!("{}", value);
}
5
hello
0°N 0°E @ 0m

Dynamic Values

We can do a lot (all?) of the dynamically-typed things from other languges, but only when we need them.

And we always see the dyn keyword, so we know it's happening.