3 releases

0.0.2	Aug 11, 2023
0.0.1	Aug 7, 2023
0.0.0	Aug 5, 2023

#62 in #cursor

MIT/Apache

54KB
679 lines

Interfaces for fog-db

There's more than one way to implement a database for fog-db, and this crate is dedicated to creating a common API for dealing with fog-db databases. It defines:

The cursor API for simultaneous, streaming navigation through both local and remote databases.
The transaction API for modifying the local database
The certificate and policy API for setting access policies into the database
The Group API for opening up a group of connections to remote database nodes

`lib.rs`:

This crate defines the interface to a generic implementation of a fog-pack database (a FogDB).

The Database

A FogDB database consists of a collection of Documents, each of which is immutable and referred to by its Hash. Documents can also link to other documents by those same hashes. The database has a set of named "root documents" that it keeps resident, and any documents that can be reached by following hash links from those roots will also be kept resident in the database. In other words, if you can reach a Document from a root, it stays in the database. If you can't, it gets evicted from the database. These links can also be "weakened" in a transaction, much as you can with most reference-tracking garbage collectors.

Documents can adhere to a Schema, which constrains a document's format and provide hints on how to compress it for storage. These schema let one pre-verify that a document can be deserialized into a data structure, and let systems know ahead of time what type of data is in a document.

Now, if the database were just immutable documents, it would be quite difficult to deal with. That's why every document adhering to a schema can also have Entries, which are essentially smaller documents attached to a parent document under a key prefix. These entries are not looked up by their Hash, but are found by running a Query on the parent document - in a FogDB, this query will return a sequence of matching entries, and will remain active in case more entries are found in the future.

The format of entries are also constrained by the parent document's schema, which puts them in an interesting position for a database, and is what makes FogDB multi-modal:

From a document-oriented view, they're a collection of documents all matching the same schema.
From a relational database view, the parent document & entry key is a table reference, and the entries are records (or entries, get it?) in the table.
From a graph database view, the documents are nodes, and the entries are edges.

Rather than provide the expected access APIs for all of these, FogDB provides a base over which such APIs can be built.

Transactions: Modifying the Database

The database has three ways to modify it:

Modify the set of root named documents by changing a name-to-hash mapping.
Modify the set of stored schema by adding or removing a schema document.
Execute a transaction on the database

Transactions are the most common way to change the database. They follow ACID properties, so when a transaction is committed, either all parts of the transaction complete simultaneously or the whole transaction is rejected. Most commonly, the transaction might fail if attempting to delete an entry that has already been removed - this is how compare-and-swap type transactions can be done to the database.

Transactions can do the following:

Add a document to the database
Weaken/strengthen document hash links
Add an entry to the database, optionally setting a time-to-live or an access policy
Modify an entry's time-to-live or its access policy
Delete an entry from the database

Documents cannot be deleted directly; instead, when they are no longer reachable from the named root documents, they are automatically garbage-collected.

Note that all transactions will only execute on the local FogDB instance; this follows the rule of the system can only modify itself, and it is up to other database nodes to modify themselves to match as they desire.

Cursors: Reading the Database

The database is accessed through the [Cursor][cursor] interface. A cursor can be opened either on a Group (see Connecting to Other Databases) or on the database. A cursor must start from some specific Document, and can be thought of as always being "over" a document. Each document can contain hashes of other documents; the cursor can follow these with a "forward" function call. Alternately, a new cursor can be "forked" off to the linked document. In this way, many cursors can be created for quicker traversal of a Document tree.

A cursor can also be used to make a query, which uses up the cursor and turns it into a CursorQuery (which can be backed out of to get the cursor back). This yields a stream of entries from the document the cursor is over.

A query is just a fog-pack Query with an optional preferred ordering to the returned Entry results. If an entry has hash links to documents, new cursors can be forked off to them using the included ForkSpawner.

Here's where it gets interesting: if a cursor was opened up on a Group, then any remote databases meeting the group's requirements can also be read by a cursor. In this way, many databases at once can be used to simultaneously retrieve documents and give query results, which is why each query result includes the source database it was retrieved from.

This means that document retrieval is near-instant when the local database has the document, but a cursor can indefinitely go searching through remote databases in search of one that has the requested document. By forking off many cursors at once, the network can use an entire swarm of remote databases to retrieve the documents.

Groups: Connecting to other Databases

Each FogDB instance exists as a single Node, which may use any number of network protocols to communicate with other Nodes. This lets the [cursor] interface use many remote databases at once to retrieve documents and get query results, and lets portions of the database be exposed to other nodes in turn.

Connecting to other nodes is done by opening a group using a group specification. This specification limits the network types over which the group will find other nodes, how it can find and connect to them, and if the nodes must identify themselves as part of a Policy (see Policies and Certificates).

Node discovery can be limited to these approximate network classes:

Machine: communication between other running FogDB instances on the same computer.
Direct: Direct machine-to-machine networking, with no switches or routers present. Primary example is WiFi Direct.
Local: local networks. LANs, ad-hoc networks, and other physically close networking systems fall under this category.
Regional: A collection of local networks that isn't the internet. Campus networks and Metropolitan area networks fall under this category. The IPv6 "organization-level" multicast scope also fits.
Global: the global internet.

Once a group is opened, the various underlying network protocols will attempt to establish a collection of nodes that fit the group's specification, and will work to set up and maintain node discovery mechanisms for the group.

Gates: Making the Database Remotely Available

When a group is established, it's not enough to actually communicate between database nodes. Each node must choose what parts of the database to expose to remote nodes, and this is done by creating a Gate. A gate allows remote nodes to open a database cursor starting at a specific document, given when the gate is opened.

Gates provide the means to easily scope access to the database: anything that can be reached from the starting document is fair game for access by a remote node in the Group. Queries can also be made on reached documents. Entries can have additional access policies that a node must match in order to be given the entry; otherwise it is skipped over.

When a query is made on a particular document reached through a gate, you can optionally hook into the query and manually provide query results. This allows for dynamic generation of query responses, and can be used to build RPC-like mechanisms from the query system.

Policies and Certificates

Policies are FogDB's way of scoping access to a database, and make use of fog-pack Identities to do so. An Identity is a public-private keypair, which can be used to sign documents and entries, and generally establish a unique identity.

Nodes can identify themselves on the network using these long-term signing keys. A full [Node Address][NodeAddr] consists of a long-term key like this, and an ephemeral key pair that is regenerated by each network protocol every time a group is created. Not all nodes will have these Identities, but they're required when joining any group with a policy in place.

Identities can be used to sign a special document called a Certificate. Certificates are identified by their signer, the subject Identity, a Hash value acting as a context, and a key string. Certificates are immutable, but new ones with the same signer/subject/context/key combination can be made in order to replace previous ones - this also serves as a way to revoke certificates. See the documentation for more info.

Certificates on their own do nothing, but with a Policy they can delegate access permissions. A policy can be as simple as a list of permitted Identities, but they can also include Policy Chains, which allow certificates to be used to establish permission.

Policies and Certificates are automatically propagated through databases; they must be actively retrieved or exchanged as part of a network protocol. FogDB doesn't specify any particular mechanism for this, leaving it up to applications and network protocols to propagate certificates and set policies. It's assumed that, as part of a FogDB setup, certificates will be stored in the database and be used to check policies.

Dependencies

~13MB
~263K SLoC