#full-text-search #bookmarks #grep #text-search #operating-system #cli

bin+lib bogrep

Full-text search for bookmarks from multiple browsers

15 releases (8 breaking)

0.9.0 Mar 20, 2024
0.8.0 Feb 28, 2024
0.6.1 Dec 1, 2023
0.6.0 Nov 30, 2023

#419 in Text processing

Apache-2.0 and GPL-3.0+

375KB
7.5K SLoC

Bogrep – Grep your bookmarks

Latest Version Build Status codecov

Bogrep downloads and caches your bookmarks in plaintext without images or videos. Use the Bogrep CLI to grep through your cached bookmarks in full-text search.

bogrep -i "reed-solomon code"

Bogrep mockup

Install Bogrep

Install Bogrep from crates.io

# Build and install bogrep binary to ~/.cargo/bin
cargo install bogrep

To update bogrep to a new version, run cargo install bogrep again. Versions 0.x will not be backwards compatible and breaking changes are expected. Remove Bogrep's configuration directory (see Supported operating systems) if you experience an issue when running Bogrep.

Install Bogrep from github.com

git clone git@github.com:quambene/bogrep.git
cd bogrep

# Build and install bogrep binary to ~/.cargo/bin
cargo install --path .

Usage

Settings and cache are installed to the configuration path, after Bogrep has been run for the first time. The configuration path depends on your operating system (see Supported operating systems).

# Import bookmarks from selected sources
bogrep import

# Fetch and cache bookmarks
bogrep fetch

# Search your bookmarks in full-text search
bogrep <pattern>

To simulate the import of bookmarks, use bogrep import --dry-run.

bogrep [OPTIONS] [PATTERN]
Options:
  -v, --verbose...          
  -m, --mode <MODE>         Search the cached bookmarks in HTML or plaintext format [possible values: html, text]
  -i, --ignore-case         Ignore case distinctions in patterns
  -l, --files-with-matches  Print only URLs of bookmarks with matched lines
  -h, --help                Print help
  -V, --version             Print version

Getting help

# Check version
bogrep --version

# Print help
bogrep --help

# Print help for subcommands
bogrep config --help
bogrep import --help
bogrep fetch --help

Import bookmarks

Import of bookmarks is supported from the following browsers:

  • Firefox (in .json and .jsonlz4 format)
  • Chromium (in .json format)
  • Chrome (in .json format)
  • Edge (in .json format)
  • Safari (in .plist format)

If bookmark files are not detected by bogrep import, you can configure them manually using:

bogrep config --source ~/path/to/bookmarks/file

Filter bookmark folders

Filter which bookmark folders are imported. Multiple folders are separated by comma:

bogrep config --source "my/path/to/bookmarks_file.json" --folders dev,science,articles

Ignore urls

Ignore specific urls. The content for these urls will not be fetched and cached.

It can be useful to ignore urls for video or music platforms which usually don't include relevant text to grep.

# Ignore one or more urls
bogrep config --ignore <url1> <url2> ...

Fetch underlying urls

Fetch the underlying urls of supported websites:

bogrep config --underlying <url1> <url2> ...

For example, if a specific url like https://news.ycombinator.com/item?id=00000000 is bookmarked, the underlying article will be fetched and cached.

Supported domains are:

  • news.ycombinator.com
  • reddit.com

Diff websites

Fetch difference between cached and fetched website for multiple urls, and display changes:

bogrep fetch --diff <url1> <url2> ...

Manage internal bookmarks

If you need to add specific URLs to the search index, use the bogrep add subcommand.

# Add URLs to search index
bogrep add <url1> <url2> ...

# Remove URLs from search index
bogrep remove <url1> <url2> ...

# Add URLs to search index and fetch content from URLs
bogrep fetch <url1> <url2> ...

Request throttling

Fetching of bookmarks from the same host is conservatively throttled, but can also be configured in the settings.json usually placed at ~/.config/bogrep in your home directory:

{
    "cache_mode": "text",
    "max_concurrent_requests": 100,
    "request_timeout": 60000,
    "request_throttling": 3000,
    "max_idle_connections_per_host": 10,
    "idle_connections_timeout": 5000
}

where request_throttling is the waiting time between requests for the same host in milliseconds.

Too speed up fetching, set max_concurrent_requests to e.g. 1000. The maximum number of available sockets depends on your operating system. Run ulimit -n to show the maximum number of open sockets allowed on your system.

For the available settings see https://docs.rs/bogrep/latest/bogrep/struct.Settings.html.

Supported operating systems

Bogrep assumes and creates a configuration path at

  • $HOME/.config/bogrep for Linux,
  • $HOME/Library/Application Support/bogrep for macOS,
  • C:\Users\<Username>\AppData\Roaming/bogrep for Windows,

in your home directory for storing the settings.json, bookmarks.json, and cache folder.

You can configure the configuration path via the environment variable BOGREP_HOME.

Testing

# Run unit tests and integration tests
cargo test

# Run unit tests
cargo test --lib

# Run integration tests
cargo test --test '*'

Dependencies

~19–34MB
~519K SLoC