2 releases

0.5.1 Oct 31, 2024
0.5.0 Oct 31, 2024

#945 in Parser implementations

MIT license

140KB
776 lines

marc-record

A Rust library for parsing MARC records, specifically using the MARC21 format, with either UTF-8 or MARC-8 encoding. This library has been tested on a bunch of records from a single provider and various samples found in the wild. Since MARC is an open standard with many variations, we may not support all the files. In particular, we do not support MARCXML at the moment.

Getting started

Add the crate to your rust library:

cargo install marc-record

Load, parse and inspect a record:

let mut contents = Vec::new();
File::open(path_to_my_file)?.read_to_end(&mut contents)?;
let records = marc_record::parse_records(&contents)?;
println!("File contains {} records", records.len());

License

marc-record is distributed under the terms of the MIT license.


lib.rs:

This crates provides means to parse MARC21 records. It supports normal MARC21 records using either MARC-8 (for latin languages) or Unicode and tries to transform as much as possible into strings. It doesn't interpret the field data much, so lookup from tag numbers will be required

Info about the format can be found here: https://www.loc.gov/marc/bibliographic/

The general structure of a MARC record is as follows:

A file can contain many MARC records. Each records has the following parts:

  • a leader: a header that contains info about the structure of the record;
  • a directory: an index of the various fields;
  • fields, which can either be control fields or data fields

All the fields have an identifying tag.

Control fields simply contain ASCII data.

Each data field can have a 2-character set of indicators, for which some meaning can be derived.

They also contain a list of subfields which are identified by a single ASCII character.

The only entrypoint to the library is the parse_records function:

use marc_record::parse_records;

let binary_data = include_bytes!("../samples/marc8_multiple.mrc");
let records = parse_records(binary_data).unwrap();
assert_eq!(records.len(), 109);

Dependencies

~2.5MB
~65K SLoC