1 unstable release
new 0.1.2 | May 11, 2025 |
---|
#11 in #mongo-db
25KB
387 lines
fimo-csv (file-mongo-csv)
fimo-csv is a fast and flexible CLI tool written in Rust that imports CSV file into MongoDB documents using YAML-based field mappings and Jinja2-style templating. It's ideal for bulk inserts, updates, and upserts with full control over document structure.
๐ Features
- โ RFC 4180-compliant CSV parsing (including headers, quoting, escaped quotes)
- ๐ ๏ธ Field mapping via YAML configuration
- ๐ง Custom transformation logic using MiniJinja
- ๐ Supports complex templated pipelines for update and upsert operations, enabling aggregation logic and fine-grained control over MongoDB document modifications.
- ๐ฆ MongoDB insert, update, and upsert support
- ๐งช Validate-only and dry-run modes
- ๐ Batch processing support for large files
- ๐ Supports Extended JSON and BSON types
- ๐ฃ Configurable CSV delimiter and quote characters
- ๐ Debug and verbose output for development and testing
- ๐ NEW: Flexible date parsing with multiple format support (e.g. ISO, MSSQL, Oracle, Go)
๐ฆ Installation
cargo install fimo-csv
Or clone and build:
git clone https://github.com/fimo-org/fimo-csv.git
cd fimo-csv
cargo build --release
๐ Usage
fimo-csv \
--input tests/data/extended.csv \
--mapping tests/mapping/extended.yaml \
--template-dir tests/templates \
--mongo-uri mongodb://localhost:27017 \
--db testdb \
--collection testcol \
--operation upsert \
--extended-json \
--debug
๐งช Example: With Templates and Extended JSON
๐ data.csv
_id,price,created_at,name,active
507f1f77bcf86cd799439011,12.34,2024-01-01T10:00:00Z,Alice,yes
๐งฉ mapping.yaml
_id:
type: objectId
price:
type: decimal
created_at:
type: date
name:
type: string
active:
type: bool
truthy: ["yes", "true", "1", "Y"]
falsy: ["no", "false", "0", "N"]
๐งพ templates/upsert.j2
{
"filter": { "_id": {{ row._id }} },
"update": {
"$set": {
"price": {{ row.price }},
"created_at": {{ row.created_at }},
"name": "{{ row.name }}",
"active": {{ row.active }}
},
"$setOnInsert": {
"created_at": {{ row.created_at }}
}
}
}
โถ๏ธ Run the Import
fimo-csv \
--input data.csv \
--mapping mapping.yaml \
--template-dir templates \
--mongo-uri mongodb://localhost:27017 \
--db testdb \
--collection customers \
--operation upsert \
--extended-json
๐งช Example: Raw Insert (No Templates)
๐ simple.csv
name,age,active
Bob,25,true
๐งฉ simple.yaml
name:
type: string
age:
type: int
active:
type: bool
โถ๏ธ Raw Insert Command
fimo-csv \
--input simple.csv \
--mapping simple.yaml \
--mongo-uri mongodb://localhost:27017 \
--db demo \
--collection people \
--operation insert \
--raw-insert
๐ง CLI Options
Option | Description |
---|---|
--input |
Path to the CSV file |
--mapping |
Path to YAML mapping file |
--mongo-uri |
MongoDB connection URI |
--db |
MongoDB database name |
--collection |
MongoDB collection name |
--operation |
insert , update , or upsert |
--batch-size |
Number of docs to write in bulk (default: 0) |
--no-header |
Use autogenerated headerscol_0 , col_1 ... |
--delimiter |
CSV delimiter (default:, ) |
--quote |
CSV quote character (default:" ) |
--template-dir |
Directory with Jinja templates |
--extended-json |
Enable support for non-JSON BSON values |
--validate-only |
Validate rows without writing to MongoDB |
--dry-run |
Print documents instead of inserting |
--debug |
Enable verbose output |
๐ง Truthy/Falsy Mapping for Booleans
In mapping.yaml, you can define per-field truthy/falsy values:
active:
type: bool
truthy: ["yes", "1", "true"]
falsy: ["no", "0", "false"]
This allows more natural mapping from "yes"/"no", "Y"/"N" strings into true/false.
๐ง Flexible Date Parsing with Custom Formats
Fimo supports parsing date strings using custom formats, giving you the flexibility to import dates from a wide range of sources such as Oracle, MSSQL, or ISO standards.
You can define multiple formats for a date field in your mapping file:
created_at:
type: date
formats:
- "%Y-%m-%dT%H:%M:%S%.fZ" # ISO 8601
- "%Y-%m-%d %H:%M:%S" # MSSQL style
- "%Y-%m-%d %H:%M:%S%.f" # Go-style (chrono-compatible)
- "%Y/%m/%d %H:%M" # Custom
Fimo will try each format in order until one matches. This makes importing data from diverse systems much easier.
โถ๏ธ Example CSV
name,created_at
Alice,2024-01-01T10:00:00Z
Bob,2024-01-01 10:00:00
โถ๏ธ Corresponding Mapping
name:
type: string
created_at:
type: date
formats:
- "%Y-%m-%dT%H:%M:%S%.fZ"
- "%Y-%m-%d %H:%M:%S"
This feature leverages the chrono crate for robust and standards-compliant date parsing.
โน๏ธ You can define multiple formats for a
date
field in theformats
array. If omitted, Fimo defaults to parsing using RFC 3339 (e.g.2024-01-01T10:00:00Z
).
๐ Project Structure
.
โโโ src/
โ โโโ main.rs # CLI entry point
โ โโโ cli.rs # Command-line argument parsing
โ โโโ mongo.rs # MongoDB connection
โ โโโ transform.rs # Mapping, templating, BSON conversion
โ โโโ mapping.rs # YAML field type parsing
โ โโโ template.rs # Jinja environment loader
โโโ mappings/ # Sample mapping YAML files
โโโ templates/ # Sample Jinja templates
โโโ tests/ # Sample CSV input for testing
โโโ Cargo.toml # Project manifest
๐ RFC 4180 Compatibility
Fimo is fully compatible with RFC 4180:
- Comma-separated fields (configurable)
- Quoted fields with escape support
- Optional headers
- Uniform field count (recommended but not enforced)
๐ License
MIT ยฉ
Dependencies
~19โ30MB
~463K SLoC