5 stable releases
new 1.0.4 | Dec 11, 2024 |
---|---|
1.0.3 | Nov 30, 2024 |
1.0.2 | Nov 8, 2024 |
1.0.1 | Nov 7, 2024 |
#1843 in Text processing
356 downloads per month
Used in rsrpp-cli
64KB
1K
SLoC
Rust Research Paper Parser (rsrpp)
RuSt Research Paper Parser (rsrpp)
The rsrpp
library provides a set of tools for parsing research papers.
Quick Start
Pre-requirements
- Poppler:
sudo apt install poppler-utils
- OpenCV:
sudo apt install libopencv-dev clang libclang-dev
Installation
To start using the rsrpp
library, add it to your project's dependencies in the Cargo.toml
file:
cargo add rsrpp
Then, import the necessary modules in your code:
extern crate rsrpp;
use rsrpp::parser;
Examples
Here is a simple example of how to use the parser module:
let mut config = ParserConfig::new();
let url = "https://arxiv.org/pdf/1706.03762";
let pages = parse(url, &mut config).await.unwrap(); // Vec<Page>
let sections = Section::from_pages(&pages); // Vec<Section>
let json = serde_json::to_string(§ions).unwrap(); // String
Tests
The library includes a set of tests to ensure its functionality. To run the tests, use the following command:
cargo test
License: MIT
Releases
1.0.4
- Fixed bugs in
get_pdf_info
. - Made minor improvements.
1.0.3
- Added cli -> rsrpp-cli.
1.0.2
- Updated the
Section
module.content: String
was replaced bycontent: Vec<TextBlock>
.
Dependencies
~19–51MB
~801K SLoC