12 releases (4 breaking)

0.5.6	Mar 1, 2025
0.5.5	Mar 1, 2025
0.5.1	Feb 28, 2025
0.4.1	Feb 16, 2025
0.1.2	Feb 2, 2025

#411 in Text processing

MIT license

1MB
5K SLoC

Krafna

Krafna is a CLI tool for SQL querying frontmatter data. Similar to Obsidian's Dataview plugin

Features

Query Markdown files in a directory using SQL-like syntax
Support for frontmatter data extraction
Flexible output formats (TSV and JSON)
Compatible with Neovim plugin Perec

Performance

Benchmarking on a base Mac mini M4 shows that Krafna can query:

~5000 files (without cache)
~100 000 files (with cache)

within ~100ms.

Caching is don with bincode. Cache files are not crazy small, but not too big. I might consider compression in the future. (Currently ~250KB for ~100 files) ONLY files that were modified since the last cache are parsed and re-cached. Cache files are stored at:

LINUX: $XDG_CACHE_HOME or $HOME/.cache/
WINDOWS: {FOLDERID_LocalAppData}
MAC: $HOME/Library/Caches

at com/7sedam7/krafna

Flamegraph is currently pointing to Pod (internal enum struct) deserialization as the biggest bottleneck.

cargo bench has been giving me weird results recently, I'm not expert at using it and did not want to spend too much time on it. gtime -v might be less precise when it comes to miliseconds, but it seems to give more realistic results.

Run benchmarks: (you can change the number of files that will be generated in bench/query_benchmark.rs)

cargo bench

Run flamegraph: (For a cleaner flamegraph, consider temporarily disabling rayon’s parallelism by replacing par_iter() with iter().)

cargo install flamegraph
cargo flamegraph --root --bin krafna -- 'select file.name, tags from frontmatter_data("../krafna-bench/bench/") where "exampl" in tags'

Installation

There are binaries available for Linux, macOS, and Windows under Releases.

Cargo

cargo install krafna

Homebrew

brew tap 7sedam7/krafna
brew install krafna

Usage

Usage: krafna [OPTIONS] [QUERY]

Arguments:
  [QUERY]  The query to execute

Options:
      --select <SELECT>
          OVERRIDES SELECT fields from the query with "field1,field2"
      --from <FROM>
          From option in case you are implementing querying for specific FROM that you don't want to specify every time. This OVERRIDES the FROM part of the query!
      --include-fields <INCLUDE_FIELDS>
          include SELECT fields with "field1,field2" (prepends them to th front of the SELECT fields in the query)
      --find <FIND>
          Find option to find all krafna snippets within a dir
      --json
          Output results in JSON format
  -h, --help
          Print help

SELECT

Currently, you can only specify field names.
There are extra added fields for the file data itself, acessible with file. (options: name, path, created, accessed, modified).
No support for *, functions, nor expressions yet.
No support for AS yet.

FROM

FRONTMATTER_DATA

FROM FRONTMATTER_DATA("<path>")
This will find all markdown files in the specified <path> and use their frontmatter data as rows.
FIELDS:
- file.name - name of the file
- file.path - path to the file
- file.created - date when the file was created
- file.accessed - date when the file was last accessed
- file.modified - date when the file was last modified
- All other fields are from frontmatter data

MD_LINKS

FROM MD_LINKS("<path>")
This will find all the links in markdown files in the specified <path>. Each link is a separate row.
FIELDS:
- file.* - file data same as above
- type - type of the link (inline, wiki)
- external - true if the link is external (not a local file)
- url - original url text from markdown file
- path - interpreted path to the local file in case link is not external. (relies on that path being within argument specified <path>, otherwise it will be empty)
- text - text of the link
- ord - order of the link in the file

MD_TASKS

FROM MD_TASKS("<path>")
This will find all the tasks in markdown files in the specified <path>. Each link is a separate row.
Tasks in markdown are defined as lines starting with - [ ] or - [x]
FIELDS:
- file.* - file data same as above
- checked - true if the task is checked (- [x])
- text - text of the task
- ord - order of the task in the file. If the task is a subtask, there is a '.' and then a number for ordering within a parent task. Nesting is supported.
- parent - parent ord of the task in the file. If the task is not a subtask, this will be empty
More functions will come.
No support for AS yet.

WHERE

Brackets are supported
Operatortors AND, OR, IN, <, <=, >, >=, ==, !=, LIKE, NOT LIKE, +, -, *, /, **, // are supported
Functions DATE(, ), DATEADD(, , , ) are supported
Arguments to functions can be hardcoded values or field names
Nested functions, or expressions as arguments are NOT supported yet
file. fields can be used in WHERE clause as well

ORDER BY

You can only specify field names followed by ASC or DESC
Functions and expressions are NOT supported yet
file. fields can be used in ORDER BY clause as well

Other

LIMIT, OFFSET, JOIN, HAVING, GROUP BY, DISTINCT, etc. are not supported yet.
UPDATE and DELETE are not supported yet.

Examples

Basic Query

krafna "SELECT title, tags FROM FRONTMATTER_DATA('~/.notes')"

Find Files

krafna --find ~/.notes

Output as JSON

krafna "SELECT * FROM FRONTMATTER_DATA('~/.notes')" --json

Include Specific Fields

krafna "SELECT * FROM FRONTMATTER_DATA('~/.notes')" --include-fields title,tags

Neovim Integration

Use with the Perec Neovim plugin for seamless integration.

Roadmap

(not in priority order)

add . support for accesing sub-fields (file.name)
* migrate file_name, etc under file (name, path, created, accessed, modified)
add default variables (today)
* change it so that it does not need to be on every row (can have a general_values hash that can be passed around, and value getters would first check there and then from the source)
Implement pruning of AND and OR operators (mostly for better error messages, performance there is more than good enough)
TODOs
Add tests for execution
add suport for functions in SELECT
add functions
* think about which functions to add
* DATE("some-date", ) -> new type date
* DATEADD()
implement val -> val operators
UPDATE
DELETE
add AS to SELECT
add querying of TODOs (think of a format similar to todoist)
* maybe abstract to query by regex
add querying of links between notes
think about which other sources would be cool to add
add group by

Acknowledgements

grey-matter-rs for parsing frontmatter data
rayon for parallelizing execution
bincode for binary serialization
CodeRabbit for code reviews
Various AI tools for help with answering questions faster then me searching on Google/StackOverflow

Author

7sedam7

Dependencies

~11–20MB
~273K SLoC