#marshal #python #parser #pretty-print

bin+lib marshal-parser

Parser for Python's "marshal" serialization format

2 releases

0.1.1 Jun 15, 2024
0.1.0 Jun 13, 2024

#1085 in Parser implementations

MIT license

115KB
902 lines

Parser for Python's "marshal" serialization format

This is a Rust port of the marshalparser project, which is written in Python.

It provides both a command-line interface and a library interface for parsing data in Python's internal "marshal" serialization format, functionality for pretty-printing the resulting data structures, and some basic data manipulation, for example, removing unused reference flags in order to make pyc files more reproducible.

The default feature set is intentionally minimal. Dependencies that are only required for building the command-line interface can be enabled with the cli flag. Pretty-printing of byte strings can be enabled with the fancy feature.

This project supports parsing "marshal" data produced by CPython versions between 3.8 and 3.13.


lib.rs:

Parser for the "marshal" binary de/serialization format used by CPython

This crate implements a parser and some utilities for reading files in the "marshal" de/serialization format used internally in CPython. The exact format is not stable and can change between minor versions of CPython.

This crate supports parsing "marshal" dumps and pyc files that were written by CPython versions >= 3.6 and < 3.14.

There is a high-level and a low-level API, depending on how much access to the underlying data structures is needed. The low-level API also provides more flexibility since it does not require files, but can operate on plain bytes (Vec<u8>).

Reading a pyc file from disk:

use marshal_parser::{MarshalFile, Object};

let pyc = MarshalFile::from_pyc_path("mod.cpython-310.pyc").unwrap();
let object: Object = pyc.into_inner();

Reading a "marshal" dump (i.e. a file without pyc header):

use marshal_parser::{MarshalFile, Object};

let dump = MarshalFile::from_dump_path("dump.marshal", (3, 11)).unwrap();
let object: Object = dump.into_inner();

Dependencies

~0.7–1.5MB
~31K SLoC