1 unstable release
new 0.1.0-alpha.1 | Apr 12, 2025 |
---|
#546 in Encoding
29 downloads per month
1MB
20K
SLoC
SciRS2 IO
Input/Output module for the SciRS2 scientific computing library. This module provides functionality for reading and writing various scientific and numerical data formats.
Features
- MATLAB Support: Read and write MATLAB
.mat
files - WAV File Support: Read and write WAV audio files
- ARFF Support: Read and write Weka ARFF files
- CSV Support: Read and write CSV files with flexible configuration options
- Image Support: Read and write common image formats (PNG, JPEG, BMP, TIFF)
- Data Serialization: Serialize and deserialize arrays, structs, and sparse matrices
- Data Compression: Compress and decompress data using multiple algorithms (GZIP, ZSTD, LZ4, BZIP2)
- Data Validation: Verify data integrity through checksums and format validation
- Error Handling: Robust error handling with detailed error information
Usage
Add the following to your Cargo.toml
:
[dependencies]
scirs2-io = { workspace = true }
Basic usage examples:
use scirs2_io::{matlab, wavfile, arff, csv, compression, validation};
use scirs2_core::error::CoreResult;
use ndarray::Array2;
// Read and write MATLAB .mat files
fn matlab_example() -> CoreResult<()> {
// Read a MATLAB file
let data = matlab::loadmat("data.mat")?;
// Access a variable from the file
let x = data.get_array::<f64>("x")?;
println!("Variable x has shape {:?}", x.shape());
// Create a new MATLAB file
let mut mat_file = matlab::MatFile::new();
// Add variables
let array = Array2::<f64>::zeros((3, 3));
mat_file.add_array("zeros", &array)?;
// Save to file
mat_file.save("output.mat")?;
Ok(())
}
// Read and write WAV audio files
fn wav_example() -> CoreResult<()> {
// Read a WAV file
let (rate, data) = wavfile::read("audio.wav")?;
println!("Audio file: {} Hz, {} samples", rate, data.len());
// Create a simple sine wave
let rate = 44100;
let freq = 440.0; // 440 Hz (A4)
let duration = 2.0; // 2 seconds
let samples = (rate as f64 * duration) as usize;
let mut sine_wave = Vec::with_capacity(samples);
for i in 0..samples {
let t = i as f64 / rate as f64;
let sample = (2.0 * std::f64::consts::PI * freq * t).sin();
// Scale to i16 range (-32768 to 32767)
let sample_i16 = (sample * 32767.0) as i16;
sine_wave.push(sample_i16);
}
// Write to WAV file
wavfile::write("sine_440hz.wav", rate, &sine_wave)?;
Ok(())
}
// Read and write ARFF files
fn arff_example() -> CoreResult<()> {
// Read an ARFF file
let dataset = arff::load_arff("data.arff")?;
println!("Dataset name: {}", dataset.relation);
println!("Attributes: {:?}", dataset.attributes.iter().map(|attr| &attr.name).collect::<Vec<_>>());
println!("Number of instances: {}", dataset.data.len());
// Create a new ARFF dataset
let mut new_dataset = arff::ArffDataset::new("my_dataset");
// Add attributes
new_dataset.add_numeric_attribute("attr1")?;
new_dataset.add_numeric_attribute("attr2")?;
new_dataset.add_nominal_attribute("class", &["class1", "class2"])?;
// Add instances
new_dataset.add_instance(&[arff::ArffValue::Numeric(1.0),
arff::ArffValue::Numeric(2.0),
arff::ArffValue::Nominal("class1".to_string())])?;
// Save to file
arff::save_arff(&new_dataset, "output.arff")?;
Ok(())
}
// Read and write CSV files
fn csv_example() -> CoreResult<()> {
// Read a CSV file with default settings
let (headers, data) = csv::read_csv("data.csv", None)?;
println!("Headers: {:?}", headers);
println!("Data shape: {:?}", data.shape());
println!("First row: {:?}", data.row(0));
// Read CSV with custom configuration
let config = csv::CsvReaderConfig {
delimiter: ';',
has_header: true,
comment_char: Some('#'),
..Default::default()
};
let (headers, data) = csv::read_csv("data.csv", Some(config))?;
// Read as numeric data
let (headers, numeric_data) = csv::read_csv_numeric("data.csv", None)?;
println!("Numeric data shape: {:?}", numeric_data.shape());
// Write data to CSV
let data_to_write = Array2::<f64>::zeros((3, 4));
let headers = vec!["A".to_string(), "B".to_string(), "C".to_string(), "D".to_string()];
csv::write_csv("output.csv", &data_to_write, Some(&headers), None)?;
// Write with custom configuration
let write_config = csv::CsvWriterConfig {
delimiter: ';',
always_quote: true,
..Default::default()
};
csv::write_csv("custom_output.csv", &data_to_write, Some(&headers), Some(write_config))?;
Ok(())
}
Components
MATLAB File Support
Functions for working with MATLAB .mat
files:
use scirs2_io::matlab::{
loadmat, // Load MATLAB .mat file
savemat, // Save variables to MATLAB .mat file
MatFile, // MATLAB file representation
MatVar, // MATLAB variable representation
};
WAV File Support
Functions for working with WAV audio files:
use scirs2_io::wavfile::{
read, // Read WAV file
write, // Write WAV file
WavInfo, // WAV file information
};
ARFF File Support
Functions for working with Weka ARFF files:
use scirs2_io::arff::{
load_arff, // Load ARFF file
save_arff, // Save ARFF file
ArffDataset, // ARFF dataset representation
ArffAttribute, // ARFF attribute representation
ArffValue, // ARFF value type (Numeric, Nominal, etc.)
};
CSV File Support
Functions for working with CSV files:
use scirs2_io::csv::{
read_csv, // Read CSV file as string data
read_csv_numeric, // Read CSV file as numeric data
read_csv_typed, // Read CSV file with type conversion
read_csv_chunked, // Read large CSV files in chunks
write_csv, // Write 2D array to CSV file
write_csv_columns, // Write columns to CSV file
write_csv_typed, // Write typed data to CSV file
CsvReaderConfig, // Configuration for CSV reading
CsvWriterConfig, // Configuration for CSV writing
DataValue, // Mixed type value representation
ColumnType, // Column data type specification
MissingValueOptions, // Missing value handling options
};
Image File Support
Functions for working with image files:
use scirs2_io::image::{
read_image, // Read image file
write_image, // Write image file
read_image_metadata, // Read image metadata
convert_image, // Convert between image formats
get_grayscale, // Convert image to grayscale
ImageFormat, // Image format (PNG, JPEG, BMP, TIFF)
ColorMode, // Image color mode (Grayscale, RGB, RGBA)
ImageMetadata, // Image metadata
};
Data Serialization
Functions for data serialization:
use scirs2_io::serialize::{
serialize_array, // Serialize ndarray
deserialize_array, // Deserialize ndarray
serialize_array_with_metadata, // Serialize ndarray with metadata
deserialize_array_with_metadata, // Deserialize ndarray with metadata
serialize_struct, // Serialize struct
deserialize_struct, // Deserialize struct
serialize_sparse_matrix, // Serialize sparse matrix
deserialize_sparse_matrix, // Deserialize sparse matrix
SerializationFormat, // Format (Binary, JSON, MessagePack)
SparseMatrixCOO, // Sparse matrix in COO format
};
Data Compression
Functions for data compression:
use scirs2_io::compression::{
compress_data, // Compress raw bytes
decompress_data, // Decompress raw bytes
compress_file, // Compress a file
decompress_file, // Decompress a file
compression_ratio, // Calculate compression ratio
algorithm_info, // Get information about compression algorithm
CompressionAlgorithm, // Compression algorithm (Gzip, Zstd, Lz4, Bzip2)
CompressionInfo, // Information about a compression algorithm
};
// For ndarray-specific compression
use scirs2_io::compression::ndarray::{
compress_array, // Compress ndarray
decompress_array, // Decompress ndarray
compress_array_chunked, // Compress large ndarray in chunks
decompress_array_chunked, // Decompress chunked ndarray
compare_compression_algorithms, // Compare compression algorithms on a dataset
CompressedArray, // Container for compressed array data
CompressedArrayMetadata, // Metadata for compressed array
};
// For data validation
use scirs2_io::validation::{
calculate_checksum, // Calculate checksum for data
verify_checksum, // Verify data against checksum
calculate_file_checksum, // Calculate checksum for a file
verify_file_checksum, // Verify file against checksum
generate_file_integrity_metadata, // Generate integrity metadata
validate_file_integrity, // Validate file against metadata
create_directory_manifest, // Create manifest for a directory
ChecksumAlgorithm, // Checksum algorithm types
IntegrityMetadata, // File integrity metadata
ValidationReport, // Validation result report
};
// For format validation
use scirs2_io::validation::formats::{
validate_format, // Validate specific file format
detect_file_format, // Detect file format automatically
validate_file_format, // Detailed format validation
DataFormat, // Common scientific data formats
};
Format Details
MATLAB File Format
The module supports:
- MATLAB level 5 MAT-File Format
- Both compressed and uncompressed files
- Basic MATLAB types: double, single, int8/16/32/64, uint8/16/32/64, logical, char
- Multidimensional arrays
- Structure arrays
- Cell arrays
WAV File Format
Supported WAV features:
- PCM format with various bit depths (8, 16, 24, 32 bit)
- Float format (32 and 64 bit)
- Mono and stereo files
- Sample rates from 8kHz to 192kHz
- Reading extended WAV format information
ARFF File Format
The ARFF implementation supports:
- Attribute types: numeric, nominal, string, date
- Sparse ARFF format
- Comment handling
- Missing values
- Relation metadata
CSV File Format
The CSV implementation supports:
- Standard comma-separated values with configurable delimiters
- Reading and writing with various options:
- Custom delimiters (comma, semicolon, tab, etc.)
- Quoted fields handling
- Comment lines
- Header row handling
- Whitespace trimming
- Line ending control (LF, CRLF)
- Conversion to and from numeric arrays
- Column-based I/O operations
- Type conversion and automatic type inference
- Missing value handling with customizable missing value markers
- Memory-efficient processing of large files using chunked reading
- Support for mixed data types (string, integer, float, boolean)
Image File Format
The image module supports:
- Common image formats:
- PNG (Portable Network Graphics)
- JPEG (Joint Photographic Experts Group)
- BMP (Bitmap)
- TIFF (Tagged Image File Format)
- Image operations:
- Reading and writing images with ndarray integration
- Converting between different image formats
- Extracting image metadata (dimensions, color type, etc.)
- Converting color images to grayscale
- Image representation:
- Images are represented as 3D arrays (height × width × channels)
- Support for different color modes (Grayscale, RGB, RGBA)
- Integration with ndarray for efficient image manipulation
Data Serialization Format
The serialize module supports:
- Multiple serialization formats:
- Binary format (compact, efficient using bincode)
- JSON format (human-readable, widely compatible)
- MessagePack format (compact binary, cross-language)
- Data types:
- Ndarray arrays with automatic shape preservation
- Arrays with additional metadata (units, descriptions, etc.)
- Structured data (any Rust struct implementing Serialize/Deserialize)
- Sparse matrices in COO (Coordinate) format
- Features:
- Automatic type detection and conversion
- Metadata preservation
- Efficient storage for both dense and sparse data
- Cross-platform compatibility
Data Compression Format
The compression module supports:
- Multiple compression algorithms:
- GZIP (good balance of speed and compression ratio)
- Zstandard (excellent compression ratio, fast decompression)
- LZ4 (extremely fast, moderate compression ratio)
- BZIP2 (high compression ratio, slower speed)
- Compression features:
- Raw data compression and decompression
- File-based compression operations
- Configurable compression levels
- Algorithm comparison and benchmarking
- Ndarray-specific compression:
- Memory-efficient compression of large arrays
- Chunked processing for arrays that don't fit in memory
- Metadata preservation during compression
- Optimized for scientific data patterns
Data Validation Features
The validation module provides:
- Checksum and integrity verification:
- Multiple checksum algorithms (CRC32, SHA-256, BLAKE3)
- File integrity metadata generation and validation
- Array integrity verification
- Directory manifests with checksums
- Format validation:
- Automatic format detection
- Format-specific validators for scientific data formats
- Detailed validation reports
- Structure validation for formats like CSV, JSON, and ARFF
- Integration features:
- Validation reports with machine-readable output
- Checksum file generation and verification
- Directory traversal and recursive validation
Contributing
See the CONTRIBUTING.md file for contribution guidelines.
License
This project is licensed under the Apache License, Version 2.0 - see the LICENSE file for details.
Dependencies
~22–34MB
~531K SLoC