3 stable releases
| 1.0.4 | Nov 8, 2025 |
|---|---|
| 1.0.3 | Oct 14, 2025 |
| 1.0.2 | Oct 12, 2025 |
#2142 in Command line utilities
71KB
1.5K
SLoC
Markdown Scanner
Overview
markdown-scanner is a Rust-based command-line tool designed to scan Markdown files within a specified directory (e.g., an Obsidian vault) and extract metadata such as tags and backlinks. It stores this information in a SQLite database for efficient querying and organization. The tool is invoked via a Bash script (markdown-processor-all-rust.bash) that processes all .md files in a directory, making it suitable for integration with text editors like Neovim or workflows involving Markdown-based note-taking systems like Obsidian.
Features
- Tag Extraction: Extracts both YAML frontmatter tags and inline
#tagsfrom Markdown files, ignoring tags within code blocks. - Backlink Detection: Identifies
[[backlink]]references in Markdown files and links them to corresponding files in the database. - SQLite Database: Stores file metadata, folder structure, tags, and backlinks in a relational SQLite database for easy querying.
- File System Integration: Resolves file paths relative to a base directory and handles file system changes, ensuring accurate metadata.
- Error Handling: Robust error handling with custom error types and detailed logging for debugging.
- Editor Integration: Designed to be triggered on file save in editors like Neovim or used in batch processing for Markdown vaults.
Usage
The tool is typically executed via the provided Bash script or directly as a command-line utility.
Project that use markdown-scanner
Bash Script
The markdown-processor-all-rust.bash script processes all .md files in a specified directory (e.g., an Obsidian vault):
#!/bin/bash
DB="markdown_data.db"
find "$Obsidian_valt_main_path" -name "*.md" | while read -r file; do
echo "Processing file: $file"
markdown-scanner "$file" "$Obsidian_valt_main_path" -d "$DB"
echo "Data inserted for file: $file"
done
- Environment Variable: Set
Obsidian_valt_main_pathto the root directory of your Markdown files. - Database: Specify the SQLite database file (defaults to
markdown_data.db). - Execution: Run the script to process all
.mdfiles in the specified directory.
Command-Line Usage
The Rust binary can be invoked directly:
markdown-scanner <file_path> <base_dir> -d <database_path>
<file_path>: Path to the Markdown file to process.<base_dir>: Base directory for resolving relative paths.-d <database_path>: Path to the SQLite database (default:markdown_data.db).
Example:
markdown-scanner /path/to/note.md /path/to/vault/dir/ -d markdown_data.db
Integration with Neovim
Well I was using it in neovim for a long time. I think I will make it in one plugin when I rip the code form my enormous init.lua.
Database Schema
The SQLite database (markdown_data.db) contains the following tables:
- folders: Stores unique folder paths with their IDs.
id: Primary key.path: Relative folder path (unique).
- files: Stores file metadata.
id: Primary key.path: Relative file path (unique).file_name: Name of the file.folder_id: Referencesfolders(id).metadata: yaml data.
- tags: Stores unique tags.
id: Primary key.tag: Tag name (unique).
- file_tags: Maps files to tags.
file_id: Referencesfiles(id).tag_id: Referencestags(id).- Unique constraint on
(file_id, tag_id).
- backlinks: Stores backlink relationships.
id: Primary key.backlink: Backlink text (e.g.,Note Title).backlink_id: Referencesfiles(id)(nullable).file_id: Referencesfiles(id).- Unique constraint on
(backlink_id, file_id, backlink).
Installation
-
Prerequisites:
- Rust (stable) and Cargo for building the Rust binary.
- SQLite library for database operations (optioinal but recommended).
- Bash for running the script.
-
Build:
cargo build --releaseOr use [this if you are using linux](cheat sheet.md)
-
Set Up Script:
- Copy
markdown-processor-all-rust.bashto a vault. - Ensure it’s executable:
chmod +x markdown-processor-all-rust.bash. - Set the
Obsidian_valt_main_pathenvironment variable or hardcode the path in the script.
- Copy
How It Works
- Initialization:
- The tool initializes a SQLite database with the required schema if it doesn’t exist.
- It uses
clapfor command-line argument parsing andenv_loggerfor detailed logging.
- File Processing:
- Reads the specified Markdown file.
- Extracts YAML frontmatter tags and inline
#tags. - Identifies
[[backlink]]references, resolving them to existing files in the database or filesystem. - Cleans content by removing code blocks, URLs, and other irrelevant text before processing tags and backlinks.
- Database Operations:
- Inserts or updates folder and file metadata.
- Stores tags and associates them with files.
- Records backlinks, linking to other files when possible.
- Handles duplicate files by preferring matches in the same folder or the shortest path.
- Filesystem Traversal:
- Uses
jwalkfor efficient filesystem traversal when resolving backlinks. - Canonicalizes paths to ensure consistency across systems.
- Uses
Limitations
- Obsidian Vault: While designed for Obsidian, the tool assumes a flat or hierarchical Markdown file structure and may not handle all Obsidian-specific features.
- Backlink Resolution: Backlinks are resolved based on file names, which may lead to ambiguities if multiple files have the same name in different folders.
- No Real-Time Updates: The tool processes files on-demand (e.g., on save or via the script) and does not monitor the filesystem for changes (But easy to fix...).
Contributing
Contributions are welcome!
TODO / Future Improvements
- Make full yaml extraction in json. Like in
datopian/markdowndb - Add
--watchTo monitor files for changes and update the database accordingly
Why I Built This
I started using Obsidian for note-taking, but I ran into a major issue that drove me up the wall: it took 20–30 seconds to start Obsidian on my Android phone, and its search functionality was painfully slow. Searching for a specific file required remembering the full path or relying on a content-based search that didn’t prioritize file names. Using a terminal with nano on my Android was significantly faster, which pushed me to find a better solution.
I explored alternatives like Logseq, but they felt restrictive, forcing me to organize notes according to their rigid rules. Then I discovered Neovim’s powerful plugin system, which works seamlessly in a TTY environment, allowing me to edit files directly on my system without the overhead of GUI-based tools. This was a game-changer for my workflow.
My first attempt was a quick Bash script paired with a basic Lua configuration for Neovim. It worked, but it was clunky. I then tried rewriting the tool entirely in Lua, thinking I could leverage Neovim’s init.lua to manage dependencies. Big mistake. Termux, my Android terminal environment, didn’t support Lua libraries well, and the setup broke completely when a package link for Lua libraries changed unexpectedly. The frustration of dealing with broken dependencies pushed me to my limit.
Eventually, I turned to Rust to create a static binary that wouldn’t rely on fickle dependencies or slow plugins. I briefly experimented with epwalsh/obsidian.nvim, which was promising but took an excruciating 14 seconds to follow a backlink on my low-powered device—slower than my rg (ripgrep) searches! While obsidian.nvim is a great tool for more powerful systems, it wasn’t suitable for my "potato calculator." So, I built markdown-scanner to create a lightweight, fast, and reliable solution that integrates with Neovim, processes Markdown files efficiently, and stores metadata in a SQLite database for quick access.
License
This project is licensed under the GNU General Public License v3.0.
Dependencies
~32MB
~604K SLoC