2 releases
new 0.5.1 | Oct 31, 2024 |
---|---|
0.5.0 | Oct 31, 2024 |
#758 in Parser implementations
200 downloads per month
140KB
776 lines
marc-record
A Rust library for parsing MARC records, specifically using the MARC21 format, with either UTF-8 or MARC-8 encoding. This library has been tested on a bunch of records from a single provider and various samples found in the wild. Since MARC is an open standard with many variations, we may not support all the files. In particular, we do not support MARCXML at the moment.
Getting started
Add the crate to your rust library:
cargo install marc-record
Load, parse and inspect a record:
let mut contents = Vec::new();
File::open(path_to_my_file)?.read_to_end(&mut contents)?;
let records = marc_record::parse_records(&contents)?;
println!("File contains {} records", records.len());
License
marc-record
is distributed under the terms of the MIT license.
lib.rs
:
This crates provides means to parse MARC21 records. It supports normal MARC21 records using either MARC-8 (for latin languages) or Unicode and tries to transform as much as possible into strings. It doesn't interpret the field data much, so lookup from tag numbers will be required
Info about the format can be found here: https://www.loc.gov/marc/bibliographic/
The general structure of a MARC record is as follows:
A file can contain many MARC records. Each records has the following parts:
- a leader: a header that contains info about the structure of the record;
- a directory: an index of the various fields;
- fields, which can either be control fields or data fields
All the fields have an identifying tag.
Control fields simply contain ASCII data.
Each data field can have a 2-character set of indicators, for which some meaning can be derived.
They also contain a list of subfields which are identified by a single ASCII character.
The only entrypoint to the library is the parse_records function:
use marc_record::parse_records;
let binary_data = include_bytes!("../samples/marc8_multiple.mrc");
let records = parse_records(binary_data).unwrap();
assert_eq!(records.len(), 109);
Dependencies
~1.7–2.4MB
~62K SLoC