#metadata #simulation #back-end #format #neo4j #results

simuldb

Database with backend and format agnostic data storage for simulation results coupled with metadata

29 releases (11 breaking)

0.12.3 Dec 7, 2023
0.11.3 Dec 1, 2023
0.11.0 Nov 30, 2023

#1091 in Database interfaces


Used in simuldb-utils

MIT/Apache

95KB
2.5K SLoC

This library provides backend and format agnostic data storage for simulation results coupled with metadata about the used [Software] and the simulation [Run]

The main use case is the following

  1. generate data on a cluster and save it with JSON backend
  2. transfer data to Neo4j backend
  3. directly use Neo4j to select data

Therefore the main goal is a simple solution for writing data and there are no plans to support advanched search or query features.

Data storage is not handled by the database, only associated metadata.

Currently two backends are included:

  • Json, which saves everything in JSON files
  • Neo4j, which uses a Neo4j database as backend (write only)

Custom backends can be implemented via the Database and DatabaseSession traits. Sessions are meant to associate a [Dataset]s specific [Run] of a [Software]. [Dataset]s are references to data stored in a file of any arbitrary format.

Features

  • json enable Json backend
  • neo4j enable Neo4j backend
  • sha enable [sha2] support for automatic hash calculations
  • arbitrary enable support for [arbitrary] (required for tests)

Example

This creates a Json based Database and writes some arbitraty data to it. Note that in order to create a session, usually the [vergen_session] macro will suffice.

use std::io::Write;
use serde::Serialize;
use simuldb::prelude::*;

// Define a metadata type
#[derive(Debug, Serialize)]
struct Metadata {
    a: usize,
    b: String,
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    # std::env::set_current_dir(format!("{}/..", env!("CARGO_MANIFEST_DIR")))?; // change to top level directory
    // Create or open database
    let mut json = Json::new("output/json");

    // Start new session which will contain references to the datasets
    let software = Software::new("example", "1.0", "now");
    let run = Run::new("now");
    let session = Session::new(software, run);
    let mut json_session = json.add_session(session)?;

    // Create a directory for the result data
    std::fs::create_dir_all("output/data")?;

    // Generate some data and add it to the database
    for a in 0_usize..10 {
        // A DataWriter can be used to automatically calculate
        // the hash of a file and create a Dataset from it
        let mut writer = DatasetWriter::new("output/data")?;

        // Write some data to the output file
        writeln!(writer, "a^2 = {}", a.pow(2))?;

        // Generate metadata to associate with it
        let metadata = Metadata {
            a,
            b: "squaring".to_string(),
        };

        // Add the corresponding dataset to the database
        let dataset = writer.finalize(metadata)?;
        json_session.add_dataset(&dataset)?;
    }

    Ok(())
}

Dependencies

~2–15MB
~196K SLoC