1 unstable release
| 0.1.0 | Aug 17, 2025 |
|---|
#1324 in Parser implementations
585KB
4K
SLoC
crabrl 🦀
Lightning-fast XBRL parser that's 50-150x faster than traditional parsers, built for speed and accuracy when processing SEC EDGAR filings.
Technical Architecture
crabrl is built on Rust's zero-cost abstractions and modern parsing techniques. While established parsers like Arelle provide comprehensive XBRL specification support and extensive validation capabilities, crabrl focuses on high-performance parsing for scenarios where speed is critical.
Implementation Details
| Optimization | Impact | Technology |
|---|---|---|
| Zero-copy parsing | -90% memory allocs | quick-xml with string slicing |
| No garbage collection | Predictable latency | Rust's ownership model |
| Faster hashmaps | 2x lookup speed | ahash instead of default hasher |
| Compact strings | -50% memory for small strings | compact_str |
| Parallelization | 4-8x on multicore | rayon work-stealing |
| Memory mapping | Zero-copy file I/O | memmap2 |
| Better allocator | -25% allocation time | mimalloc |
Benchmark results: 100,000 XBRL facts parsed in 56ms (crabrl) vs 2,672ms (Arelle) on identical hardware.
XBRL Support Status
| Feature | Description | Status |
|---|---|---|
| XBRL 2.1 Instance | Parse facts, contexts, units from .xml files |
✅ Stable |
| SEC Validation | EDGAR-specific rules and checks | ✅ Stable |
| Calculation Linkbase | Validate arithmetic relationships | ✅ Stable |
| Presentation Linkbase | Extract display hierarchy | 🚧 Beta |
| Label Linkbase | Human-readable concept names | 🚧 Beta |
| Definition Linkbase | Dimensional relationships | 📋 Planned |
| Formula Linkbase | Business rules validation | 📋 Planned |
| Inline XBRL (iXBRL) | HTML-embedded XBRL | 📋 Planned |
Installation
From crates.io
cargo install crabrl
From Source
git clone https://github.com/stefanoamorelli/crabrl
cd crabrl
cargo build --release --features cli
As Library Dependency
[dependencies]
crabrl = "0.1.0"
Usage
CLI
# Parse and display summary
crabrl parse filing.xml
# Parse with statistics (timing and throughput)
crabrl parse filing.xml --stats
# Validate with generic rules
crabrl validate filing.xml
# Validate with SEC EDGAR rules
crabrl validate filing.xml --profile sec-edgar
# Validate with strict mode (warnings as errors)
crabrl validate filing.xml --strict
# Benchmark performance
crabrl bench filing.xml --iterations 100
Library
Basic Usage
use crabrl::Parser;
// Parse XBRL document
let parser = Parser::new();
let doc = parser.parse_file("filing.xml")?;
// Access parsed data
println!("Facts: {}", doc.facts.len());
println!("Contexts: {}", doc.contexts.len());
println!("Units: {}", doc.units.len());
Parse from Different Sources
// From file path
let doc = parser.parse_file("filing.xml")?;
// From bytes
let xml_bytes = std::fs::read("filing.xml")?;
let doc = parser.parse_bytes(&xml_bytes)?;
Validation
use crabrl::{Parser, Validator};
let parser = Parser::new();
let doc = parser.parse_file("filing.xml")?;
// Generic validation
let validator = Validator::new();
let result = validator.validate(&doc)?;
if result.is_valid {
println!("Document is valid!");
} else {
for error in &result.errors {
eprintln!("Error: {}", error);
}
}
// SEC EDGAR validation (stricter rules)
let sec_validator = Validator::sec_edgar();
let sec_result = sec_validator.validate(&doc)?;
Performance Measurements
Performance comparison with Arelle v2.17.4 (Python-based XBRL processor with full specification support):
Synthetic Dataset Benchmarks
| File Size | Facts | crabrl | Arelle | Ratio |
|---|---|---|---|---|
| Tiny | 10 | 1.1 ms | 164 ms | 150x |
| Small | 100 | 1.4 ms | 168 ms | 119x |
| Medium | 1K | 1.7 ms | 184 ms | 108x |
| Large | 10K | 6.1 ms | 351 ms | 58x |
| Huge | 100K | 57 ms | 2,672 ms | 47x |
SEC Filing Parse Times
| Company | Filing Type | File Size | Facts | Parse Time | Throughput |
|---|---|---|---|---|---|
| Apple | 10-K 2023 | 1.4 MB | 1,075 | 2.1 ms | 516K facts/sec |
| Microsoft | 10-Q 2023 | 2.8 MB | 2,341 | 4.3 ms | 544K facts/sec |
| Tesla | 10-K 2023 | 3.1 MB | 3,122 | 5.8 ms | 538K facts/sec |
Run Your Own Benchmarks
# Quick benchmark with Criterion
cargo bench
# Compare against Arelle
cd benchmarks && python compare_performance.py
# Test on real SEC filings
python scripts/download_fixtures.py # Download Apple, MSFT, Tesla, etc.
cargo run --release --bin crabrl -- bench fixtures/apple/aapl-20230930_htm.xml
Resources & Links
XBRL Standards
- XBRL International - Official XBRL specifications
- XBRL 2.1 Specification - Core standard we implement
- SEC EDGAR - Search real company filings
- EDGAR Filer Manual - SEC filing requirements
Dependencies We Use
| Crate | Purpose | Why We Chose It |
|---|---|---|
quick-xml |
XML parsing | Zero-copy, fastest XML parser in Rust |
ahash |
HashMap hashing | 2x faster than default hasher |
compact_str |
String storage | Small string optimization |
rayon |
Parallelization | Work-stealing for automatic load balancing |
mimalloc |
Memory allocator | Microsoft's high-performance allocator |
criterion |
Benchmarking | Statistical benchmarking with graphs |
Alternative XBRL Parsers
- Arelle - Complete XBRL processor with validation, formulas, and rendering (Python)
- python-xbrl - Lightweight Python parser
- xbrl-parser - JavaScript/Node.js
- XBRL4j - Java implementation
License ⚖️
This open-source project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). This means:
- You can use, modify, and distribute this software
- If you modify and distribute it, you must release your changes under AGPL-3.0
- If you run a modified version on a server, you must provide the source code to users
- See the LICENSE file for full details
For commercial licensing options or other licensing inquiries, please contact stefano@amorelli.tech.
© 2025 Stefano Amorelli – Released under the GNU Affero General Public License v3.0. Enjoy! 🎉
Dependencies
~6–18MB
~186K SLoC