42 releases
| 0.1.41 | Feb 23, 2026 |
|---|---|
| 0.1.40 | Feb 9, 2026 |
| 0.1.39 | Dec 20, 2025 |
| 0.1.22 | Nov 27, 2025 |
#75 in WebAssembly
78 downloads per month
1MB
21K
SLoC
rds2rust
A pure Rust library for reading and writing R's RDS (R Data Serialization) files without requiring an R runtime. Inspired by rds2cpp, which provides similar functionality with a C++ implementation.
Features
- Pure Rust implementation - No R runtime required
- Broad RDS format support - Reads and writes core R object types
- Memory efficient - Optimized with string interning, compact attributes, and object deduplication
- Automatic compression - Transparent gzip compression/decompression
- Type safe - Strong Rust types for all R objects
- Zero-copy where possible - Efficient parsing and serialization
- Thread-aware - Use
into_concrete_deep()before sharing parsed objects across threads
Supported R Types
- Primitive types: NULL, integers, doubles, logicals, characters, raw bytes, complex numbers
- Collections: vectors, lists, pairlists, expression vectors
- Data structures: data frames, matrices, factors (ordered and unordered)
- Object-oriented: S3 objects, S4 objects with slots
- Language objects: formulas, unevaluated expressions, function calls
- Functions: closures, environments, promises, special/builtin functions
- Advanced: reference tracking (REFSXP), ALTREP compact sequences
Installation
Add this to your Cargo.toml:
[dependencies]
rds2rust = "0.1"
Quick Start
Reading an RDS file
use rds2rust::{read_rds, RObject};
use std::fs;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Read RDS file (automatically decompresses if gzipped)
let data = fs::read("data.rds")?;
let result = read_rds(&data)?;
let obj = result.object;
// Pattern match on R object type
match obj {
RObject::DataFrame(df) => {
println!("Data frame with {} columns", df.columns.len());
// Access a specific column
if let Some(RObject::Real(values)) = df.columns.get("temperature") {
println!("Temperature values: {:?}", values);
}
}
RObject::Integer(vec) => {
println!("Integer vector: {:?}", vec);
}
_ => println!("Other R object type"),
}
for warning in result.warnings {
eprintln!("Warning: {}", warning);
}
Ok(())
}
Writing an RDS file
use rds2rust::{write_rds, RObject, VectorData};
use std::fs;
use std::sync::Arc;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create an R object (e.g., a character vector)
let obj = RObject::Character(VectorData::Owned(vec![
Arc::from("hello"),
Arc::from("world"),
]));
// Serialize to RDS format (automatically gzip compressed)
let rds_data = write_rds(&obj)?;
// Write to file
fs::write("output.rds", rds_data)?;
Ok(())
}
Streaming RDS writes (native)
For large outputs, stream directly to a Write sink to avoid buffering the whole file in memory.
use rds2rust::{write_rds_streaming, write_rds_atomic, RObject, VectorData};
use std::fs::File;
use std::io::BufWriter;
use std::sync::Arc;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let obj = RObject::Character(VectorData::Owned(vec![
Arc::from("hello"),
Arc::from("streaming"),
]));
// Stream to a file (gzip compressed)
let file = File::create("output.rds")?;
write_rds_streaming(&obj, BufWriter::new(file))?;
// Or write atomically (safe replace on success)
write_rds_atomic(&obj, "output.rds")?;
Ok(())
}
Working with Data Frames
use rds2rust::{read_rds, RObject};
// Read a data frame
let data = std::fs::read("iris.rds")?;
let result = read_rds(&data)?;
let obj = result.object;
if let RObject::DataFrame(df) = obj {
// Access columns by name
let sepal_length = df.columns.get("Sepal.Length");
let species = df.columns.get("Species");
// Access row names
println!("First row name: {}", df.row_names[0]);
// Iterate over columns
for (name, values) in &df.columns {
println!("Column: {}", name);
}
}
Large-File Extraction (Streaming-Oriented)
For very large files, you can extract vectors without materializing the whole object in memory.
The rds-extract CLI writes one file per vector plus an optional JSON manifest.
WASM Support (Large Files)
The WASM path uses async input, a Blob-backed chunk source, and worker-friendly helpers. Decompression uses a size-based strategy:
- <500MB: in-memory buffer
- 500MB–10GB: Blob-backed chunked reads
- >10GB: streaming mode (sequential)
See docs/wasm_decompression.md for the JS helper, worker wrapper, and validation targets.
Gzip-compressed .rds.gz files are auto-detected in the WASM helper (browser support required:
Chrome/Edge 89+, Firefox 102+, Safari 16.4+). Unsupported formats (bzip2/xz) return helpful
errors.
WASM Streaming Decompression (Rust API)
For memory-efficient parsing of compressed RDS files in wasm32, use the Rust streaming API
that automatically detects compression format and chooses the optimal parsing strategy:
use rds2rust::{
check_streaming_decompression_support, traverse_rds_blob_streaming, ParseConfig, RdsVisitor,
};
use wasm_bindgen::JsValue;
use web_sys::Blob;
async fn parse_blob<V: RdsVisitor>(blob: Blob, visitor: &mut V) -> Result<(), JsValue> {
check_streaming_decompression_support()
.map_err(|msg| JsValue::from_str(&msg))?;
traverse_rds_blob_streaming(blob, ParseConfig::default(), visitor)
.await
.map_err(|err| JsValue::from_str(&format!("{:?}", err)))
}
Memory Efficiency:
- Gzip files: Uses
DecompressionStreamAPI with bounded buffer (64-128MB) - Uncompressed files: Uses cached random-access reads
- Unsupported formats (xz/bzip2): Clear error with fallback instructions
Browser Requirements:
DecompressionStreamAPI (Chrome 89+, Firefox 102+, Safari 16.4+)- For older browsers, use
decompressBlobIfNeeded()to pre-decompress
Progress Reporting:
use rds2rust::{
traverse_rds_blob_streaming_with_progress, ParseConfig, RdsVisitor, StreamingProgress,
};
use wasm_bindgen::JsValue;
use web_sys::Blob;
async fn parse_with_progress<V: RdsVisitor>(
blob: Blob,
visitor: &mut V,
) -> Result<(), JsValue> {
let mut on_progress = |progress: StreamingProgress| {
if let Some(total) = progress.total_bytes {
let pct = 100.0 * progress.bytes_read as f64 / total as f64;
web_sys::console::log_1(
&format!("Progress: {} bytes ({:.1}%)", progress.bytes_read, pct).into(),
);
} else {
web_sys::console::log_1(&format!("Progress: {} bytes", progress.bytes_read).into());
}
};
traverse_rds_blob_streaming_with_progress(
blob,
ParseConfig::default(),
visitor,
&mut on_progress,
)
.await
.map_err(|err| JsValue::from_str(&format!("{:?}", err)))
}
WASM Streaming Writer (Rust API)
WASM exposes chunked writer helpers that avoid large allocations in Rust. These emit
Uint8Array chunks to a JS callback.
use js_sys::{Function, Uint8Array};
use rds2rust::{recommended_chunk_size_mb, write_rds_with_callback, RObject};
use wasm_bindgen::prelude::*;
use wasm_bindgen::JsCast;
fn write_with_callback(obj: &RObject) -> Result<(), JsValue> {
let chunk_size_mb = Some(recommended_chunk_size_mb());
let callback = Closure::wrap(Box::new(move |chunk: Uint8Array| {
// Handle each chunk (e.g. push into a JS array)
let _ = chunk;
}) as Box<dyn FnMut(Uint8Array)>);
let callback_fn: Function = callback.as_ref().unchecked_ref::<Function>().clone();
write_rds_with_callback(obj, callback_fn, chunk_size_mb)
.map_err(|err| JsValue::from_str(&format!("{:?}", err)))?;
callback.forget();
Ok(())
}
Progress callback reports bytes written (not percent):
// Use write_rds_with_progress(...) for byte count updates.
WASM Gzip Support
| Format | Extension | Status |
|---|---|---|
| gzip | .rds.gz, .rds.gzip |
Supported |
| uncompressed | .rds |
Supported |
| bzip2 | .rds.bz2 |
Unsupported |
| xz | .rds.xz |
Unsupported |
CLI
rds-extract data.rds out/ data.matrix meta.data --budget-mb 512 --manifest manifest.json
rds-extract data.rds out/ --object-path data --manifest manifest.json
rds-extract data.rds out/ --object-kind dataframe --object-path data
rds-extract convert data.rds out/ --object-kind dataframe --object-path data
rds-extract convert data.rds out/ --object-kind dataframe --chunked
rds-extract convert data.rds out/ --object-kind sparse-matrix --object-path data.matrix --chunked --chunk-size-mb 4
If no paths are provided, the root object is extracted. Use --object-path to expand higher-level
objects (data.frames, dense matrices, sparse matrices, lists) into their component vectors.
Use --object-kind to enforce the expected object type and emit a clearer error on mismatch.
Use --chunked to avoid mapping the full decompressed stream in memory; it trades some
performance for a lower steady-state memory footprint on huge files.
When field names contain dots (e.g., slot.value), use quoted segments:
data["slot.value"].
Streaming is the default and avoids materializing large lazy vectors; it streams spans directly
from the backing store. Use --no-streaming to force materialization if needed. Use
--chunk-size-mb to cap per-read buffer size when streaming. Streaming is best paired with
--chunked to avoid mmap'ing large decompressed streams.
Streaming Traversal API
Use traverse_rds_streaming (sync) or traverse_rds_streaming_with_progress to walk an RDS
stream without materializing large vectors. Implement RdsVisitor to receive events:
on_object_start/on_object_endfor object boundarieson_vector_metadatafor vector length/kindon_vector_chunk_availablefor lazy vector spanson_shared_referencefor REFSXP references (target path may beNone)
Notes:
- ALTREP metadata is best-effort: compact sequences and wrapped vectors emit estimated length; other forms only report attributes.
- Singleton environment markers (global/base/empty/unbound) are treated as leaf nodes in streaming.
WASM Extraction APIs (Rust API)
WASM builds expose Rust helpers that return JsValue (typed arrays) or call a callback per chunk:
extract_vector_to_js(obj, source, path) -> JsValueextract_vector_chunked(obj, source, path, chunk_size, callback)
Raw Dump Format
Each output file contains:
- Header:
RDS2VEC1+ version + kind + endian + reserved + length(u64) + elem_size(u32) - Payload:
- Numeric/logical/complex/raw: element bytes (big-endian for numeric types)
- Character: repeated records of
i32 length+ UTF-8 bytes
The manifest JSON lists each extracted vector with its path, file, kind, length,
elem_size, and endian, plus a top-level object_kind. This allows a reader to map files
back to R object paths.
Manifest Versioning
The manifest includes a top-level version field. Version 1 is the initial schema:
{ "version": 1, "object_kind": "...", "vectors": [...], "missing": [...] }. Future schema
changes will increment this number and preserve backward compatibility where possible.
Reader Guidance
Recommended reader flow:
- Load the manifest JSON.
- For each entry, open the referenced
.rdsvecfile. - Validate the header (
RDS2VEC1, version, kind, endian, length, elem_size). - Read payload:
- Numeric/logical/complex/raw: fixed-size element bytes.
- Character: repeated
i32 length+ UTF-8 bytes records.
Example validation helper:
use rds2rust::{read_extraction_manifest, validate_vector_file_header};
let manifest = read_extraction_manifest("out/manifest.json")?;
for entry in &manifest.vectors {
let path = format!("out/{}", entry.file);
validate_vector_file_header(&path, entry)?;
}
High-Level Conversion Helpers
Library callers can use the higher-level conversion helpers to expand objects and emit raw dumps plus manifests without manually enumerating paths:
use rds2rust::{
extract_object_to_raw_files_with_input_streaming,
extract_object_to_raw_files_with_kind_and_input_streaming,
ChunkedRdsSource, ObjectKind, ParseConfig,
};
let source = ChunkedRdsSource::from_path("data.rds")?;
let obj = rds2rust::read_rds_with_input(&source, ParseConfig::for_trusted_large_file())?.object;
let output = extract_object_to_raw_files_with_input_streaming(
&obj,
&source,
"data",
Some(4 * 1024 * 1024),
std::path::Path::new("out"),
Some("manifest.json"),
)?;
let output = extract_object_to_raw_files_with_kind_and_input_streaming(
&obj,
&source,
"data",
ObjectKind::DataFrame,
Some(4 * 1024 * 1024),
std::path::Path::new("out"),
Some("manifest.json"),
)?;
Chunked Read APIs
If you want chunked reads in library code, use the chunked path helpers:
use rds2rust::{read_rds_from_path_chunked, ParseConfig};
let obj = read_rds_from_path_chunked("data.rds")?.object;
let obj = rds2rust::read_rds_from_path_chunked_with_config(
"data.rds",
ParseConfig::for_trusted_large_file(),
)?.object;
Lazy metadata parsing with chunked reads:
use rds2rust::read_rds_lazy_from_path_chunked;
let obj = read_rds_lazy_from_path_chunked("data.rds")?.object;
assert!(!obj.is_fully_loaded());
Working with Factors
use rds2rust::{read_rds, RObject};
let data = std::fs::read("factor.rds")?;
let obj = read_rds(&data)?.object;
if let RObject::Factor(factor) = obj {
// Check if it's an ordered factor
if factor.ordered {
println!("Ordered factor with {} levels", factor.levels.len());
}
// Get level labels
for level in &factor.levels {
println!("Level: {}", level);
}
// Get values (1-based indices into levels)
for &index in &factor.values {
if index > 0 && index <= factor.levels.len() as i32 {
let level = &factor.levels[(index - 1) as usize];
println!("Value: {}", level);
}
}
}
Working with S3/S4 Objects
use rds2rust::{read_rds, RObject};
let data = std::fs::read("model.rds")?;
let obj = read_rds(&data)?.object;
// S3 objects
if let RObject::S3Object(s3) = obj {
println!("S3 class: {:?}", s3.class);
// Access base object
match s3.base.as_ref() {
RObject::List(elements) => {
println!("S3 object is a list with {} elements", elements.len());
}
_ => {}
}
// Access additional attributes
if let Some(desc) = s3.attributes.get("description") {
println!("Description: {:?}", desc);
}
}
// S4 objects
if let RObject::S4Object(s4) = obj {
println!("S4 class: {:?}", s4.class);
// Access slots
if let Some(slot_value) = s4.slots.get("data") {
println!("Data slot: {:?}", slot_value);
}
}
Roundtrip: Read and Write
use rds2rust::{read_rds, write_rds};
use std::fs;
// Read an RDS file
let input_data = fs::read("input.rds")?;
let obj = read_rds(&input_data)?.object;
// Process the data...
// (modify the object as needed)
// Write back to RDS format
let output_data = write_rds(&obj)?;
fs::write("output.rds", output_data)?;
// Verify roundtrip
let obj2 = read_rds(&output_data)?.object;
assert_eq!(obj, obj2);
Type System
The RObject enum represents all possible R object types:
pub enum RObject {
Null,
Integer(VectorData<i32>),
Real(VectorData<f64>),
Logical(VectorData<Logical>),
Character(VectorData<Arc<str>>),
Symbol(Arc<str>),
Raw(VectorData<u8>),
Complex(VectorData<Complex>),
List(Vec<RObject>),
Pairlist(Vec<PairlistElement>),
Language { function: Box<RObject>, args: Vec<PairlistElement> },
Expression(Vec<RObject>),
Closure { formals: Box<RObject>, body: Box<RObject>, environment: Box<RObject> },
Environment { enclosing: Box<RObject>, frame: Box<RObject>, hashtab: Box<RObject> },
Promise { value: Box<RObject>, expression: Box<RObject>, environment: Box<RObject> },
Special { name: Arc<str> },
Builtin { name: Arc<str> },
Bytecode { code: Box<RObject>, constants: Box<RObject>, expr: Box<RObject> },
DataFrame(Box<DataFrameData>),
Factor(Box<FactorData>),
S3Object(Box<S3ObjectData>),
S4Object(Box<S4ObjectData>),
Namespace(Vec<Arc<str>>),
GlobalEnv,
BaseEnv,
EmptyEnv,
MissingArg,
UnboundValue,
Shared(Arc<RwLock<RObject>>),
WithAttributes { object: Box<RObject>, attributes: Attributes },
}
Special Values
R's special values are represented as:
- NA (integers):
RObject::NA_INTEGERconstant (i32::MIN) - NA (logicals):
Logical::Naenum variant - NA (real): Check with
f64::is_nan() - Inf/-Inf:
f64::INFINITYandf64::NEG_INFINITY - NaN:
f64::NAN
Memory Optimizations
rds2rust includes several memory optimizations for efficient data processing:
- String Interning - All strings use
Arc<str>for automatic deduplication - Boxed Large Variants - Large enum variants are boxed to reduce memory overhead
- Compact Attributes - SmallVec stores 0-2 attributes inline without heap allocation
- Object Deduplication - Identical objects are automatically shared during parsing
These optimizations provide 20-50% memory reduction for typical RDS files while maintaining zero API overhead.
Performance Tips
Reading Large Files
use rds2rust::read_rds;
use std::fs::File;
use std::io::Read;
// For very large files, read in chunks if needed
let mut file = File::open("large.rds")?;
let mut buffer = Vec::new();
file.read_to_end(&mut buffer)?;
let obj = read_rds(&buffer)?.object;
Reusing Parsed Objects
use std::sync::Arc;
use rds2rust::RObject;
// Wrap in Arc for cheap cloning
let obj = Arc::new(read_rds(&data)?.object);
// Clone is cheap (just increments reference count)
let obj2 = Arc::clone(&obj);
Limitations
- Write support: All R types can be written except for some complex environment configurations
- Compression formats: Currently supports gzip; bzip2/xz support planned
- ALTREP: Reads ALTREP objects but writes them as regular vectors
- External pointers: Not supported (rarely used in serialized data)
Development Status
Current version: 0.1.40
Test coverage: extensive test suite covering core R object types and roundtrips
Completed phases:
- ✅ All basic R types (NULL, vectors, matrices, data frames)
- ✅ All object-oriented types (S3, S4, factors)
- ✅ All language types (expressions, formulas, closures, environments)
- ✅ All special types (promises, special functions, builtin functions)
- ✅ Reference tracking and ALTREP optimization
- ✅ Complete read/write roundtrip support
- ✅ Memory optimizations (string interning, compact attributes, deduplication)
License
Licensed under:
- MIT license (LICENSE or http://opensource.org/licenses/MIT)
Resources
Dependencies
~1–10MB
~211K SLoC