#markdown #obsidian #sql #json-format #cli

bin+lib krafna

Krafna is a terminal-based alternative to Obsidian's Dataview plugin, allowing you to query your Markdown files using standard SQL syntax

12 releases (4 breaking)

0.5.6 Mar 1, 2025
0.5.5 Mar 1, 2025
0.5.1 Feb 28, 2025
0.4.1 Feb 16, 2025
0.1.2 Feb 2, 2025

#153 in Text processing

Download history 208/week @ 2025-01-27 515/week @ 2025-02-03 148/week @ 2025-02-10 158/week @ 2025-02-17 350/week @ 2025-02-24 92/week @ 2025-03-03

824 downloads per month

MIT license

1MB
5K SLoC

Krafna

codecov CodeRabbit Pull Request Reviews Crates.io Version Crates.io Total Downloads

Krafna is a CLI tool for SQL querying frontmatter data. Similar to Obsidian's Dataview plugin

Features

  • Query Markdown files in a directory using SQL-like syntax
  • Support for frontmatter data extraction
  • Flexible output formats (TSV and JSON)
  • Compatible with Neovim plugin Perec

Performance

Benchmarking on a base Mac mini M4 shows that Krafna can query:

  • ~5000 files (without cache)
  • ~100 000 files (with cache)

within ~100ms.

Caching is don with bincode. Cache files are not crazy small, but not too big. I might consider compression in the future. (Currently ~250KB for ~100 files) ONLY files that were modified since the last cache are parsed and re-cached. Cache files are stored at:

  • LINUX: $XDG_CACHE_HOME or $HOME/.cache/
  • WINDOWS: {FOLDERID_LocalAppData}
  • MAC: $HOME/Library/Caches

at com/7sedam7/krafna

Flamegraph is currently pointing to Pod (internal enum struct) deserialization as the biggest bottleneck.

cargo bench has been giving me weird results recently, I'm not expert at using it and did not want to spend too much time on it. gtime -v might be less precise when it comes to miliseconds, but it seems to give more realistic results.

Run benchmarks: (you can change the number of files that will be generated in bench/query_benchmark.rs)

cargo bench

Run flamegraph: (For a cleaner flamegraph, consider temporarily disabling rayon’s parallelism by replacing par_iter() with iter().)

cargo install flamegraph
cargo flamegraph --root --bin krafna -- 'select file.name, tags from frontmatter_data("../krafna-bench/bench/") where "exampl" in tags'

Installation

There are binaries available for Linux, macOS, and Windows under Releases.

Cargo

cargo install krafna

Homebrew

brew tap 7sedam7/krafna
brew install krafna

Usage

Usage: krafna [OPTIONS] [QUERY]

Arguments:
  [QUERY]  The query to execute

Options:
      --select <SELECT>
          OVERRIDES SELECT fields from the query with "field1,field2"
      --from <FROM>
          From option in case you are implementing querying for specific FROM that you don't want to specify every time. This OVERRIDES the FROM part of the query!
      --include-fields <INCLUDE_FIELDS>
          include SELECT fields with "field1,field2" (prepends them to th front of the SELECT fields in the query)
      --find <FIND>
          Find option to find all krafna snippets within a dir
      --json
          Output results in JSON format
  -h, --help
          Print help

SELECT

  • Currently, you can only specify field names.
  • There are extra added fields for the file data itself, acessible with file. (options: name, path, created, accessed, modified).
  • No support for *, functions, nor expressions yet.
  • No support for AS yet.

FROM

FRONTMATTER_DATA

  • FROM FRONTMATTER_DATA("<path>")
  • This will find all markdown files in the specified <path> and use their frontmatter data as rows.
  • FIELDS:
    • file.name - name of the file
    • file.path - path to the file
    • file.created - date when the file was created
    • file.accessed - date when the file was last accessed
    • file.modified - date when the file was last modified
    • All other fields are from frontmatter data
  • FROM MD_LINKS("<path>")
  • This will find all the links in markdown files in the specified <path>. Each link is a separate row.
  • FIELDS:
    • file.* - file data same as above
    • type - type of the link (inline, wiki)
    • external - true if the link is external (not a local file)
    • url - original url text from markdown file
    • path - interpreted path to the local file in case link is not external. (relies on that path being within argument specified <path>, otherwise it will be empty)
    • text - text of the link
    • ord - order of the link in the file

MD_TASKS

  • FROM MD_TASKS("<path>")

  • This will find all the tasks in markdown files in the specified <path>. Each link is a separate row.

  • Tasks in markdown are defined as lines starting with - [ ] or - [x]

  • FIELDS:

    • file.* - file data same as above
    • checked - true if the task is checked (- [x])
    • text - text of the task
    • ord - order of the task in the file. If the task is a subtask, there is a '.' and then a number for ordering within a parent task. Nesting is supported.
    • parent - parent ord of the task in the file. If the task is not a subtask, this will be empty
  • More functions will come.

  • No support for AS yet.

WHERE

  • Brackets are supported
  • Operatortors AND, OR, IN, <, <=, >, >=, ==, !=, LIKE, NOT LIKE, +, -, *, /, **, // are supported
  • Functions DATE(, ), DATEADD(, , , ) are supported
  • Arguments to functions can be hardcoded values or field names
  • Nested functions, or expressions as arguments are NOT supported yet
  • file. fields can be used in WHERE clause as well

ORDER BY

  • You can only specify field names followed by ASC or DESC
  • Functions and expressions are NOT supported yet
  • file. fields can be used in ORDER BY clause as well

Other

  • LIMIT, OFFSET, JOIN, HAVING, GROUP BY, DISTINCT, etc. are not supported yet.
  • UPDATE and DELETE are not supported yet.

Examples

Basic Query

krafna "SELECT title, tags FROM FRONTMATTER_DATA('~/.notes')"

Find Files

krafna --find ~/.notes

Output as JSON

krafna "SELECT * FROM FRONTMATTER_DATA('~/.notes')" --json

Include Specific Fields

krafna "SELECT * FROM FRONTMATTER_DATA('~/.notes')" --include-fields title,tags

Neovim Integration

Use with the Perec Neovim plugin for seamless integration.

Roadmap

(not in priority order)

  • add . support for accesing sub-fields (file.name)
  • * migrate file_name, etc under file (name, path, created, accessed, modified)
  • add default variables (today)
  • * change it so that it does not need to be on every row (can have a general_values hash that can be passed around, and value getters would first check there and then from the source)
  • Implement pruning of AND and OR operators (mostly for better error messages, performance there is more than good enough)
  • TODOs
  • Add tests for execution
  • add suport for functions in SELECT
  • add functions
  • * think about which functions to add
  • * DATE("some-date", ) -> new type date
  • * DATEADD()
  • implement val -> val operators
  • UPDATE
  • DELETE
  • add AS to SELECT
  • add querying of TODOs (think of a format similar to todoist)
  • * maybe abstract to query by regex
  • add querying of links between notes
  • think about which other sources would be cool to add
  • add group by

Acknowledgements

  • grey-matter-rs for parsing frontmatter data
  • rayon for parallelizing execution
  • bincode for binary serialization
  • CodeRabbit for code reviews
  • Various AI tools for help with answering questions faster then me searching on Google/StackOverflow

Author

7sedam7

Dependencies

~11–20MB
~274K SLoC