20 releases (8 breaking)
0.9.0 | Nov 29, 2024 |
---|---|
0.7.0 | Nov 17, 2024 |
#154 in Database interfaces
1,952 downloads per month
Used in learnerd
180KB
2K
SLoC
Features Installation Usage Configuration Roadmap Contributing Development License Acknowledgements
Features
-
Paper Metadata Management
- Support for arXiv, IACR, and DOI sources
- Automatic source detection from URLs or identifiers
- Full metadata extraction including authors and abstracts
-
Local Database
- SQLite-based storage with full-text search
- Configurable document storage
- Platform-specific defaults
-
Interactive Interfaces
- Terminal User Interface (TUI) with vim-style navigation
- Command-line interface (CLI) for scripting and automation with shell CLI completions
- Search, filter, and preview functionality
- Document management and viewing
- Daemon support for background operations
Installation
Library
[dependencies]
learner = { version = "*" } # Uses latest version
CLI Tool
cargo +nightly install learnerd --features tui
This installs both the CLI tool and TUI interface, accessible via the learner
command.
To obtain shell completions for learner
:
# replace fish with your shell: bash, zsh or whatever
# then, move completions to somewhere reasonable, and source them from your shell setup config.
learner -g fish > learner_completions.fish
source learner_completions.fish
Usage
Library Usage
use learner::{Paper, Database};
#[tokio::main]
async fn main() -> Result> {
let db = Database::open(Database::default_path()).await?;
// Add papers from various sources
let paper = Paper::new("https://arxiv.org/abs/2301.07041").await?;
paper.save(&db).await?;
// Download associated document
let storage = Database::default_storage_path();
paper.download_pdf(&storage).await?;
Ok(())
}
Command Line Interface
# Initialize database
learner init --default-retrievers
# Add papers
learner add 2301.07041
learner add "https://arxiv.org/abs/2301.07041" --pdf
learner add "10.1145/1327452.1327492" --no-pdf
# Search papers
learner search "quantum computing"
learner search "quantum" --author "Feynman" --detailed
learner search "neural" --source arxiv --before 2023
# Remove papers
learner remove "outdated paper"
learner remove "temp" --force --remove-pdf
Terminal User Interface
If you install with
cargo install learnerd --features tui
you can get access to a Terminal User Interface (TUI). To launch the interactive TUI just do:
learner
TUI navigation:
↑
/k
,↓
/j
: Navigate papers←
/h
,→
/l
: Switch panes:
: Enter command modeo
: Open selected PDFq
: Quit
TUI commands:
:add # Add a paper
:remove # Remove paper(s)
:search # Search papers
(TODO:) Search within TUI supports all filters:
:search "quantum" --author "Feynman"
:search "neural" --source arxiv --before 2023
System Daemon Management
learnerd
can run as a background service for paper monitoring and updates.
Currently, there are no distinct processes it runs but there is a tracking issue: issue #83.
System Service
# Install and start
sudo learnerd daemon install
sudo systemctl enable --now learnerd # Linux
sudo launchctl load /Library/LaunchDaemons/learnerd.daemon.plist # macOS
# Remove
sudo learnerd daemon uninstall
Logs
- Linux: /var/log/learnerd/
- macOS: /Library/Logs/learnerd/
Files: learnerd.log
(main, rotated daily), stdout.log
, stderr.log
Troubleshooting
- Permission Errors: Check ownership of log directories
- Won't Start: Check system logs and remove stale PID file if present
- Installation: Run commands as root/sudo
Configuration
The learner
system uses a flexible configuration system that allows customization of paper sources, storage paths, and retrieval behavior.
Default Locations
-
Config:
- Linux:
~/.config/learner/config.toml
- macOS:
~/Library/Application Support/learner/config.toml
- Windows:
%APPDATA%\learner\config.toml
- Linux:
-
Database:
- Linux:
~/.local/share/learner/learner.db
- macOS:
~/Library/Application Support/learner/learner.db
- Windows:
%APPDATA%\learner\learner.db
- Linux:
-
Papers:
- Linux/macOS:
~/Documents/learner/papers
- Windows:
Documents\learner\papers
- Linux/macOS:
Configuration File
The configuration file (config.toml
) allows you to customize:
# Base configuration
[config]
database_path = "/custom/path/to/db.sqlite" # Where the datbase itself is stored
storage_path = "/custom/path/to/papers" # Where the documents are stored
retrievers_path = "/custom/path/to/papers" # Where configuration for retrievers are stored
Adding Custom Sources
- Create a source configuration in TOML:
[sources.new_source]
name = "New Paper Source"
base_url = "https://api.example.com"
pattern = "^PREFIX-\\d+$" # Regex for identifier validation
endpoint_template = "/api/v1/papers/{identifier}"
headers = { "API-Key" = "your-key" } # Optional headers
# For JSON responses
response_format = { type = "json" }
field_maps.title = { path = "data.title" }
field_maps.abstract = { path = "data.description" }
field_maps.pdf_url = {
path = "data.files.pdf",
transform = { type = "url", base = "https://cdn.example.com", suffix = ".pdf" }
}
# For XML responses
response_format = { type = "xml" }
field_maps.title = { path = "paper/title" }
field_maps.authors = { path = "paper/authors/author" }
Put this TOML configuration file in your ~/.learner/retrievers/
(or equivalent) directory.
Examples can be found in crates/learner/config/retrievers/
.
Source Requirements
Custom sources must provide:
- A unique identifier pattern (regex)
- An API endpoint that returns paper metadata
- Field mappings for required metadata:
- Title
- Authors
- Abstract
- Publication date
- Optional: PDF URL, DOI
Supported Response Formats
-
JSON:
- Path-based field extraction
- Value transformations (dates, URLs)
- Array handling for authors/references
-
XML:
- XPath-style field selection
- Namespace handling
- Multiple value aggregation
Project Structure
-
learner
- Core library- Paper metadata extraction and management
- Database operations and search
- PDF handling and source-specific clients
- Error handling and type safety
-
learnerd
- CLI application- Paper and document management interface
- System daemon capabilities
- Logging and diagnostics
Roadmap
- Generic LLM integration (similar to the configurable
Retriever
abstraction) - RAG system
- Document version control and annotations
- Paper discovery and streaming
- Configurable daemon process (e.g., watch file system, RSS, automated LLM querying)
- REST API and Daemonize so
learner
can be a plugin with/for other apps (e.g., Raycast, Syncthing) - Database improvements (more searchable fields, tags, organization)
- TUI improvements (organization, flexibility, in-terminal paper reading)
- Citation analysis and related works.
Contributing
Contributions welcome! Please open an issue before making major changes.
CI Workflow
Our automated pipeline ensures:
-
Code Quality
- rustfmt and taplo for consistent formatting
- clippy for Rust best practices
- cargo-udeps for dependency management
- cargo-semver-checks for API compatibility
-
Testing
- Full test suite across workspace and platforms
All checks must pass before merging pull requests.
Development
This project uses just as a command runner.
# Setup
cargo install just
just setup
# Common commands
just test # run tests
just fmt # format code
just ci # run all checks
just build-all # build all targets
[!TIP] Running
just setup
andjust ci
locally is a quick way to get up to speed and see that the repo is working on your system!
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- arXiv API for paper metadata
- IACR for cryptography papers
- CrossRef for DOI resolution
- SQLite for local database support
Dependencies
~46–63MB
~1M SLoC