1 unstable release
0.1.1 | May 21, 2024 |
---|
#1939 in Encoding
102 downloads per month
Used in embed_anything
140KB
963 lines
DOXC-PARSER
This package uses the docx-rs crate to parse docx files. It subsequently converts the parsed docx file into Markdown format. Alternatively, it can also be used to convert docx files into JSON format, where only the structure relevant for creating Markdown documents is kept.
It can be used as a library, or you can install it and use it from the command line.
CLI application
$ git clone https://github.com/erikvullings/docx-parser.git
$ cargo install --path .
$ docx-parser -h
Processes a DOCX file and outputs as Markdown or JSON
Usage: docx-parser [OPTIONS] <FILE>
Arguments:
<FILE> The input DOCX file
Options:
-o, --output <OUTPUT> Sets the output destination. Default is console
-f, --format <FORMAT> Sets the output format. Default is markdown. Options: md, json, pretty_json
-h, --help Print help
-V, --version Print version
# Example
$ docx-parser ./test/tables.docx -f pretty_json
Library
use docx_parser::MarkdownDocument;
let markdown_doc = MarkdownDocument::from_file("./test/tables.docx");
let markdown = markdown_doc.to_markdown(true);
let json = markdown_doc.to_json(true);
println!("\n\n{}", markdown);
println!("\n\n{}", json);
Development commands
cargo update
cargo test
cargo build --release
Dependencies
~6MB
~105K SLoC