#vm #garbage-collection #gc #collector #heap #fun #safety

cell-gc

A fun garbage collector for that virtual machine you're writing in Rust

8 releases

Uses old Rust 2015

0.2.1 Jun 29, 2017
0.2.0 May 16, 2016
0.1.5 Feb 2, 2016
0.1.1 Jan 29, 2016

#401 in Memory management

MIT license

73KB
1K SLoC

cell-gc Build Status

A simple garbage collector for use in Rust.

The goal is to help you quickly build a VM in Rust. So this GC is designed for:

  • Safety

  • No dependency on linters or compiler plugins

  • An API that's consistent with a high-performance implementation (though right now cell-gc is not speedy)

  • Fun

Caveats

cell-gc is for use in VMs. So the assumption is that the data the GC is managing is not really your data; it's your end user's data. If you don't want every field of every GC-managed object to be public and mutable, cell-gc is not the GC for your project!

The API is completely unstable. I promise I will change it in ways that will break code; you'll just have to keep up until things stabilize.

cell-gc is not designed to support multithread access to a single heap (like Java). Instead, you can create one heap per thread (like JavaScript).

Currently it does not support lots of small heaps with random lifetimes (like Erlang), but I have some ideas on how to get there. See issue #7.

How to use it

There are two parts to using cell_gc: GC types and GC heaps.

Declaring GC types

Declaring GC types is actually super easy. Just add #[derive(IntoHeap)] to any struct:

extern crate cell_gc;
#[macro_use] extern crate cell_gc_derive;

/// A linked list of numbers that lives in the GC heap.
/// The `#[derive(IntoHeap)]` here causes Rust to define an additional
/// type, `IntListRef`.
#[derive(IntoHeap)]
struct IntList<'h> {
    head: i64,
    tail: Option<IntListRef<'h>>
}
# fn main(){}

cell_gc does several things:

  • Behind the scenes, it generates some code used for garbage collection, such as marking code. You never need to worry about that stuff.

  • It checks that IntList's fields are all GC-safe.

    Not every type is safe to use as a field of a heap struct or enum. Here are the allowed field types:

    • primitive types, like i32
    • types declared with #[derive(IntoHeap)], like IntList<'h> and IntListRef<'h>
    • Box<T> where T has 'static lifetime
    • Rc<T> where T has 'static lifetime
    • Option<T> where T is any of these types

    If you try to use anything else, you'll get bizarre error messages from rustc.

  • It declare a Ref type for you, in this case IntListRef. cell_gc names this type by gluing Ref to the end of the struct name. IntListRef is a smart pointer to a GC-managed IntList. You need this because cell_gc doesn't let you have normal Rust references to stuff in the GC heap.

    IntListRef values keep in-heap IntList values alive; once the last IntListRef pointing at an object is gone, it becomes available for garbage collection, and eventually it'll be recycled.

    IntListRef is like std::rc::Rc: it's Clone but not Copy, and calling .clone() copies the Ref, not the object it points to.

    Ref types have accessor methods for getting and setting each field of the struct. For example, IntList has methods .head(), .tail(), .set_head(i64), and .set_tail(Option<IntListRef>).

You can also derive IntoHeap for an enum, but support is incomplete: no Ref type is generated for enums. Tuple structs are not supported.

Understanding heaps

This part isn't documented well yet. But here's an example, using the IntList from above:

# #[macro_use] extern crate cell_gc;
# #[macro_use] extern crate cell_gc_derive;
# #[derive(IntoHeap)]
# struct IntList<'h> {
#     head: i64,
#     tail: Option<IntListRef<'h>>
# }
use cell_gc::Heap;

fn main() {
    // Create a heap (you'll only do this once in your whole program)
    let mut heap = Heap::new();

    heap.enter(|hs| {
        // Allocate an object (returns an IntListRef)
        let obj1 = hs.alloc(IntList { head: 17, tail: None });
        assert_eq!(obj1.head(), 17);
        assert_eq!(obj1.tail(), None);

        // Allocate another object
        let obj2 = hs.alloc(IntList { head: 33, tail: Some(obj1) });
        assert_eq!(obj2.head(), 33);
        assert_eq!(obj2.tail().unwrap().head(), 17);
    });
}

Use Heap::new() in your main() function to create a heap. Use heap.enter() to gain access to the heap (opening a "heap session", hs). Use hs.alloc(v) to allocate values in the heap.

Vectors in the GC heap

A very simple "object" type for a text adventure game:

#[macro_use] extern crate cell_gc;
#[macro_use] extern crate cell_gc_derive;

use cell_gc::collections::VecRef;

#[derive(IntoHeap)]
struct Object<'h> {
    name: String,
    description: String,
    children: VecRef<'h, ObjectRef<'h>>
}
# fn main() {}

Note that children is a VecRef<'h, ObjectRef<'h>>; that is, it is a reference to a separately GC-allocated Vec<ObjectRef<'h>>, which is a vector of references to other objects. In other words, this is exactly what you would have in Java for a field declared like this:

public ArrayList<Object> children;

The API generated by this macro looks like this:

# struct VecRef<'h, T: 'h>(&'h T);  // hack to make this compile
struct Object<'h> {
    name: String,
    description: String,
    children: VecRef<'h, ObjectRef<'h>>
}

struct ObjectRef<'h> {
   /* all fields private */
#  target: &'h Object<'h>     // hack to make this compile
}

impl<'h> ObjectRef<'h> {
    fn name(&self) -> String
#       { unimplemented!(); }
    fn set_name(&self, name: String)
#       { unimplemented!(); }
    fn description(&self) -> String
#       { unimplemented!(); }
    fn set_description(&self, description: String)
#       { unimplemented!(); }
    fn children(&self) -> VecRef<'h, ObjectRef<'h>>
#       { unimplemented!(); }
    fn set_children(&self, children: VecRef<'h, ObjectRef<'h>>)
#       { unimplemented!(); }
}

(You might never actually use that set_children() method. Instead, you'll initialize the children field with a vector when you create the object, and then you'll most likely mutate that existing vector rather than ever creating a new one.)

You can allocate Objects in the heap using hs.alloc(Object { ... }), and make one Object a child of another by using obj1.children().push(obj2).

Safety

As long as you don't type the keyword unsafe in your code, this GC is safe.[citation needed]

Still, there's one weird rule to be aware of: Don't implement Drop or Clone for any type declared using derive(IntoHeap). It's safe in the full Rust sense of that word (it won't cause crashes or undefined behavior, as long as your .drop() or .clone() method does nothing unsafe), but it won't do what you want. Your .drop() and .clone() methods simply will not be called when you expect; and they'll be called at other times that make no sense.

So don't do that! The safe alternative is to put a Box or Rc around your value (the one that implements Drop or Clone) and use that as a field of a GC heap struct.

Why is it called "cell-gc"?

In cell-gc, every field of every GC-managed object is mutable. You can't get direct references to the data; instead you use methods to get and set values.

It's as though every field were a Cell.

Dependencies

~2MB
~44K SLoC