#pdf #paper #research #parser

rsrpp

A Rust project for research paper pdf

5 stable releases

new 1.0.4 Dec 11, 2024
1.0.3 Nov 30, 2024
1.0.2 Nov 8, 2024
1.0.1 Nov 7, 2024

#1843 in Text processing

Download history 296/week @ 2024-11-04 21/week @ 2024-11-11 15/week @ 2024-11-18 109/week @ 2024-11-25 24/week @ 2024-12-02 207/week @ 2024-12-09

356 downloads per month
Used in rsrpp-cli

MIT license

64KB
1K SLoC

Rust Research Paper Parser (rsrpp)

CircleCI

RuSt Research Paper Parser (rsrpp)

The rsrpp library provides a set of tools for parsing research papers.

Quick Start

Pre-requirements

  • Poppler: sudo apt install poppler-utils
  • OpenCV: sudo apt install libopencv-dev clang libclang-dev

Installation

To start using the rsrpp library, add it to your project's dependencies in the Cargo.toml file:

cargo add rsrpp

Then, import the necessary modules in your code:

extern crate rsrpp;
use rsrpp::parser;

Examples

Here is a simple example of how to use the parser module:

let mut config = ParserConfig::new();
let url = "https://arxiv.org/pdf/1706.03762";
let pages = parse(url, &mut config).await.unwrap(); // Vec<Page>
let sections = Section::from_pages(&pages); // Vec<Section>
let json = serde_json::to_string(&sections).unwrap(); // String

Tests

The library includes a set of tests to ensure its functionality. To run the tests, use the following command:

cargo test

License: MIT

Releases

1.0.4

  • Fixed bugs in get_pdf_info.
  • Made minor improvements.

1.0.3

1.0.2

  • Updated the Section module. content: String was replaced by content: Vec<TextBlock>.

Dependencies

~19–51MB
~801K SLoC