#string #cons #immutable #persistent #prepend

nightly char-list

A persistent string type with the same API as a linked-list of characters

4 releases

0.1.0 Nov 5, 2022
0.0.3 Oct 26, 2022
0.0.2 Oct 26, 2022
0.0.1 Oct 24, 2022

#39 in #persistent

MIT license

56KB
969 lines

char-list

A persistent string type with the same API as a linked-list of characters.

Docs

See the docs for memory-layout diagrams and explanations of how it all works.

DISCLAIMER: unsafe

This crate is a work in progress. Specifically Not all uses of unsafe have been validated! Please don't use this for anything serious yet.

Safety audits are welcome and appreciated! I'm still quite new to writing unsafe code.

Also, this crate depends on front-vec which is also badly in need of auditing.


lib.rs:

A persistent string type with the same API as a linked-list of characters.

DISCLAIMER: unsafe

This crate is a work in progress. Specifically Not all uses of unsafe have been validated! Please don't use this for anything serious yet.

Safety audits are welcome and appreciated! I'm still quite new to writing unsafe code.

Also, this crate depends on front-vec which is also badly in need of auditing.

Internal Representation

This crate relies on front_vec::FrontString to store arrays of bytes which allow for efficient prepending (pushing characters onto the front of the array).

Each FrontString is owned by a PqRcCell (internal to this crate, soon to be it's own crate) which is sorta like a cross between a RefCell and a RcBox (the thing an Rc pointer points to).

A CharList is just a pointer + length (though it's not a slice) which refers to a specific slice of a PqRcCell-managed FrontString.

The PqRcCell ensures that out of all CharLists which refer to it's FrontString, only those with the longest length are allowed to mutate the inner FrontString value. And they can only* mutate it in ways that other CharLists won't notice.

* (Ok, at this point we make developers pinky-swear 🤞 they'll follow those rules, and they gotta wrap their use in an unsafe block. Maybe there's a better way to do this in the future though 🤔)

That was a lotta words, let's look at an example with memory diagrams.

Memory Diagram

Note: Simplified Diagrams

Every time the word len or refs appears in these diagrams, in the code it corresponds to either priority or priorities. The internal representation of a CharList is actually just a PqRc (Priority Queue Reference Counted) pointer which abstracts over its priority. For CharLists, priority is the same as string length.

Example

Consider this code:

# use char_list::CharList;
use assert2::assert; // For nicer assertions.
let icon = CharList::from("icon");
let nomicon = icon.cons_str("nom");
let rustonomicon = nomicon.cons_str("rusto");
let rustonomicon2 = rustonomicon.clone();

assert!(icon == "icon");
assert!(nomicon == "nomicon");
assert!(rustonomicon == "rustonomicon");
assert!(rustonomicon == rustonomicon2);

These values would be represented in memory like this:

![did it work?][ex1]

(The arrows with dashed lines represent the implied beginnings of CharList slices, i.e. icon implicitly "refers to" the last four bytes in the buffer.)

Notice that all cons_str and clone operations are immutable, i.e. they create new CharLists instead of mutating the original.

One More cons

# use char_list::CharList;
# use assert2::assert;
# let icon = CharList::from("icon");
# let nomicon = icon.cons_str("nom");
# let rustonomicon = nomicon.cons_str("rusto");
# let rustonomicon2 = rustonomicon.clone();
#
# assert!(icon == "icon");
# assert!(nomicon == "nomicon");
# assert!(rustonomicon == "rustonomicon");
# assert!(rustonomicon == rustonomicon2);
#
let the_book = rustonomicon.cons_str("the ");
assert!(the_book == "the rustonomicon");

Here's what memory looks like now:

![did it work?][ex2]

Because rustonomicon has the longest length in the refs table, it's perfectly safe to call the underlying FrontString's push_char_front method (mutably!).

Notice that by pushing a character onto the front, rustonomicon doesn't affect any other CharList's view of the underlying FrontString.

Dropping

Ok now what happens if we drop the longest three CharLists?

# use char_list::CharList;
# use assert2::assert;
# let icon = CharList::from("icon");
# let nomicon = icon.cons_str("nom");
# let rustonomicon = nomicon.cons_str("rusto");
# let rustonomicon2 = rustonomicon.clone();
#
# assert!(icon == "icon");
# assert!(nomicon == "nomicon");
# assert!(rustonomicon == "rustonomicon");
# assert!(rustonomicon == rustonomicon2);
#
# let the_book = rustonomicon.cons_str("the ");
# assert!(the_book == "the rustonomicon");
#
drop(rustonomicon);
drop(the_book);
drop(rustonomicon2);

![did it work?][ex3]

Notice that the internal FrontString's len went down to 7. This means we can reuse the memory that used to be held by the longer strings!

Tricky cons

Here's a problem though. What if we want to cons the character 'b' onto the front of icon?

# use char_list::CharList;
# use assert2::assert;
# let icon = CharList::from("icon");
# let nomicon = icon.cons_str("nom");
# let rustonomicon = nomicon.cons_str("rusto");
# let rustonomicon2 = rustonomicon.clone();
#
# assert!(icon == "icon");
# assert!(nomicon == "nomicon");
# assert!(rustonomicon == "rustonomicon");
# assert!(rustonomicon == rustonomicon2);
#
# let the_book = rustonomicon.cons_str("the ");
# assert!(the_book == "the rustonomicon");
#
# drop(rustonomicon);
# drop(the_book);
# drop(rustonomicon2);
#
let janelle_monae = icon.cons('b');
assert!(janelle_monae == "bicon");

Well, if we overwrite the 'm' that currently sits in front of "icon", we'd be modifying nomicon's view of the string. That's no good, so we gotta clone icon's view of the FrontString and mutate the copy.

![did it work?][ex4]

Dependencies

~0.4–0.8MB
~18K SLoC