4 releases (breaking)

new 0.4.0 Mar 27, 2023
0.3.0 Mar 25, 2023
0.2.0 Feb 6, 2023
0.1.0 Jan 13, 2023

#321 in Text processing

Download history 22/week @ 2023-01-10 12/week @ 2023-01-17 8/week @ 2023-01-24 14/week @ 2023-01-31 24/week @ 2023-02-07 11/week @ 2023-02-14 2/week @ 2023-02-21 53/week @ 2023-03-21

55 downloads per month
Used in stam-tools


6.5K SLoC

Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

STAM Library

STAM is a data model for stand-off text annotation and described in detail here. This is a software library to work with the model, written in Rust.

This is the primary software library for working with the data model. It is currently in a preliminary stage. We aim to implement the full model and most extensions.


Add stam to your project's Cargo.toml:

$ cargo add stam


Import the library

use stam;

Loading a STAM JSON file containing an annotation store:

fn your_function() -> Result<(),stam::StamError> {
    let store = stam::AnnotationStore::from_file("example.stam.json", Config::default())?;

We assume some kind of function returning Result<_,stam::StamError> for all examples in this section.

The annotation store is your workspace, it holds all resources, annotation sets (i.e. keys and annotation data) and of course the actual annotations. It is a memory-based store and you can as much as you like into it (as long as it fits in memory:).

Retrieving anything by ID:

let annotation: &stam::Annotation = store.get_by_id("my-annotation")?;
let resource: &stam::TextResource = store.get_by_id("my-resource")?;
let annotationset: &stam::AnnotationDataSet = store.get_by_id("my-annotationset")?;
let key: &stam::DataKey = annotationset.get_by_id("my-key")?;
let data: &stam::AnnotationData = annotationset.get_by_id("my-data")?;

Note it is important to specify the return type, as that's how the compiler can infer what you want to get. The methods are provided by the ForStore<T> trait.)

Iterating through all annotations in the store, and outputting a simple tab separated format:

for annotation in store.annotations() {
    let id = annotation.id().unwrap_or("");
    for (key, data, dataset) in store.data_by_annotation(annotation) {
        // get the text to which this annotation refers (if any)
        let text: Vec<&str> = store.text_by_annotation(annotation).collect();
        print!("{}\t{}\t{}\t{}", id, key.id().unwrap(), data.value(), text.join(" "));

Add resources:

let resource_handle = store.insert( stam::TextResource::from_file("my-text.txt", store.config()) )?;

Many methods return a so called handle instead of a reference. You can use this handle to obtain a reference as shown in the next example, in which we obtain a reference to the resource we just inserted:

let resource: &stam::Resource = store.get(resource_handle)?;

Retrieving items by handle is much faster than retrieval by public ID, as handles encapsulate an internal numeric ID. Passing around handles is also cheap and sometimes easier than passing around references, as it avoids borrowing issues.

Add annotations:

let annotation_handle = store.annotate( stam::Annotation::builder()
           .target_text( "testres".into(), stam::Offset::simple(6,11)) 
           .with_data("testdataset".into(), "pos".into(), stam::DataValue::String("noun".to_string())) 

Here we see some Builder types that are use a builder pattern to construct instances of their respective types. The actual instances will be built by the underlying store. You can note the heavy use of into() to coerce the parameters to the right type. Rather than pass string parameters referring to public IDs, you may just as well pass and coerce (again with into()) references like &Annotation, &AnnotationDataSet, &DataKey or handles. We call the type of these parameters AnyId<T> and you will encounter them in more places.

Create a store and annotations from scratch, with an explicitly filled AnnotationDataSet:

let store = stam::AnnotationStore::new().with_id("test".into())
    .add( stam::TextResource::from_string("testres".into(), "Hello world".into()))?
    .add( stam::AnnotationDataSet::new().with_id("testdataset".into())
           .add( stam::DataKey::new("pos".into()))?
           .with_data("D1".into(), "pos".into() , "noun".into())?
    .with_annotation( stam::Annotation::builder() 
            .target_text( "testres".into(), stam::Offset::simple(6,11)) 
            .with_data_by_id("testdataset".into(), "D1".into()) )?;

And here is the very same thing but the AnnotationDataSet is filled implicitly here:

let store = stam::AnnotationStore::new().with_id("test".into())
    .add( stam::TextResource::from_string("testres".to_string(),"Hello world".into()))?
    .add( stam::AnnotationDataSet::new().with_id("testdataset".into()))?
    .with_annotation( stam::Annotation::builder()
            .target_text( "testres".into(), stam::Offset::simple(6,11)) 

The implementation will ensure to reuse any already existing AnnotationData if possible, as not duplicating data is one of the core characteristics of the STAM model.

You can serialize the entire annotation store (including all sets and annotations) to a STAM JSON file:


API Reference Documentation

See here

Python binding

This library comes with a binding for Python, see here


This work is conducted at the KNAW Humanities Cluster's Digital Infrastructure department, and funded by the CLARIAH project (CLARIAH-PLUS, NWO grant 184.034.023) as part of the FAIR Annotations track.


~112K SLoC