12 releases (4 breaking)
0.5.6 | Mar 1, 2025 |
---|---|
0.5.5 | Mar 1, 2025 |
0.5.1 | Feb 28, 2025 |
0.4.1 | Feb 16, 2025 |
0.1.2 | Feb 2, 2025 |
#153 in Text processing
824 downloads per month
1MB
5K
SLoC
Krafna
Features
- Query Markdown files in a directory using SQL-like syntax
- Support for frontmatter data extraction
- Flexible output formats (TSV and JSON)
- Compatible with Neovim plugin Perec
Performance
Benchmarking on a base Mac mini M4 shows that Krafna can query:
- ~5000 files (without cache)
- ~100 000 files (with cache)
within ~100ms.
Caching is don with bincode. Cache files are not crazy small, but not too big. I might consider compression in the future. (Currently ~250KB for ~100 files) ONLY files that were modified since the last cache are parsed and re-cached. Cache files are stored at:
- LINUX: $XDG_CACHE_HOME or $HOME/.cache/
- WINDOWS: {FOLDERID_LocalAppData}
- MAC: $HOME/Library/Caches
at com/7sedam7/krafna
Flamegraph is currently pointing to Pod (internal enum struct) deserialization as the biggest bottleneck.
cargo bench
has been giving me weird results recently, I'm not expert at using it and did not want to spend too much time on it.
gtime -v
might be less precise when it comes to miliseconds, but it seems to give more realistic results.
Run benchmarks: (you can change the number of files that will be generated in bench/query_benchmark.rs)
cargo bench
Run flamegraph: (For a cleaner flamegraph, consider temporarily disabling rayon’s parallelism by replacing par_iter()
with iter()
.)
cargo install flamegraph
cargo flamegraph --root --bin krafna -- 'select file.name, tags from frontmatter_data("../krafna-bench/bench/") where "exampl" in tags'
Installation
There are binaries available for Linux, macOS, and Windows under Releases.
Cargo
cargo install krafna
Homebrew
brew tap 7sedam7/krafna
brew install krafna
Usage
Usage: krafna [OPTIONS] [QUERY]
Arguments:
[QUERY] The query to execute
Options:
--select <SELECT>
OVERRIDES SELECT fields from the query with "field1,field2"
--from <FROM>
From option in case you are implementing querying for specific FROM that you don't want to specify every time. This OVERRIDES the FROM part of the query!
--include-fields <INCLUDE_FIELDS>
include SELECT fields with "field1,field2" (prepends them to th front of the SELECT fields in the query)
--find <FIND>
Find option to find all krafna snippets within a dir
--json
Output results in JSON format
-h, --help
Print help
SELECT
- Currently, you can only specify field names.
- There are extra added fields for the file data itself, acessible with file. (options: name, path, created, accessed, modified).
- No support for *, functions, nor expressions yet.
- No support for AS yet.
FROM
FRONTMATTER_DATA
FROM FRONTMATTER_DATA("<path>")
- This will find all markdown files in the specified
<path>
and use their frontmatter data as rows. - FIELDS:
file.name
- name of the filefile.path
- path to the filefile.created
- date when the file was createdfile.accessed
- date when the file was last accessedfile.modified
- date when the file was last modified- All other fields are from frontmatter data
MD_LINKS
FROM MD_LINKS("<path>")
- This will find all the links in markdown files in the specified
<path>
. Each link is a separate row. - FIELDS:
file.*
- file data same as abovetype
- type of the link (inline, wiki)external
- true if the link is external (not a local file)url
- original url text from markdown filepath
- interpreted path to the local file in case link is not external. (relies on that path being within argument specified<path>
, otherwise it will be empty)text
- text of the linkord
- order of the link in the file
MD_TASKS
-
FROM MD_TASKS("<path>")
-
This will find all the tasks in markdown files in the specified
<path>
. Each link is a separate row. -
Tasks in markdown are defined as lines starting with
- [ ]
or- [x]
-
FIELDS:
file.*
- file data same as abovechecked
- true if the task is checked (- [x]
)text
- text of the taskord
- order of the task in the file. If the task is a subtask, there is a '.' and then a number for ordering within a parent task. Nesting is supported.parent
- parentord
of the task in the file. If the task is not a subtask, this will be empty
-
More functions will come.
-
No support for AS yet.
WHERE
- Brackets are supported
- Operatortors AND, OR, IN, <, <=, >, >=, ==, !=, LIKE, NOT LIKE, +, -, *, /, **, // are supported
- Functions DATE(, ), DATEADD(, , , ) are supported
- Arguments to functions can be hardcoded values or field names
- Nested functions, or expressions as arguments are NOT supported yet
- file. fields can be used in WHERE clause as well
ORDER BY
- You can only specify field names followed by ASC or DESC
- Functions and expressions are NOT supported yet
- file. fields can be used in ORDER BY clause as well
Other
- LIMIT, OFFSET, JOIN, HAVING, GROUP BY, DISTINCT, etc. are not supported yet.
- UPDATE and DELETE are not supported yet.
Examples
Basic Query
krafna "SELECT title, tags FROM FRONTMATTER_DATA('~/.notes')"
Find Files
krafna --find ~/.notes
Output as JSON
krafna "SELECT * FROM FRONTMATTER_DATA('~/.notes')" --json
Include Specific Fields
krafna "SELECT * FROM FRONTMATTER_DATA('~/.notes')" --include-fields title,tags
Neovim Integration
Use with the Perec Neovim plugin for seamless integration.
Roadmap
(not in priority order)
- add . support for accesing sub-fields (file.name)
- * migrate file_name, etc under file (name, path, created, accessed, modified)
- add default variables (today)
- * change it so that it does not need to be on every row (can have a general_values hash that can be passed around, and value getters would first check there and then from the source)
- Implement pruning of AND and OR operators (mostly for better error messages, performance there is more than good enough)
- TODOs
- Add tests for execution
- add suport for functions in SELECT
- add functions
- * think about which functions to add
- * DATE("some-date", ) -> new type date
- * DATEADD()
- implement val -> val operators
- UPDATE
- DELETE
- add AS to SELECT
- add querying of TODOs (think of a format similar to todoist)
- * maybe abstract to query by regex
- add querying of links between notes
- think about which other sources would be cool to add
- add group by
Acknowledgements
- grey-matter-rs for parsing frontmatter data
- rayon for parallelizing execution
- bincode for binary serialization
- CodeRabbit for code reviews
- Various AI tools for help with answering questions faster then me searching on Google/StackOverflow
Author
Dependencies
~11–20MB
~274K SLoC