3 unstable releases
Uses new Rust 2024
new 0.3.0 | Apr 11, 2025 |
---|---|
0.2.1 | Apr 11, 2025 |
0.2.0 | Apr 10, 2025 |
#194 in Text processing
207 downloads per month
18KB
353 lines
tkrar
A fast and feature-rich CLI tool written in Rust to count frequency of words in a file or a directory.
Name origin:
The name tkrar (تكرار) comes from the Arabic word for repetition or frequency.
It's pronounced like: tek-raar (with a rolled 'r').
Features
- Count frequency of words in a file or a directory recursively
- Process input from stdin
- Supports case sensitivity
- ignore stopwords
- ignoring words with a minimum character count
- ignoring words with a regex pattern
- ignoring non-alphanumeric characters
- ignoring provided files path
- Supports outputting results in JSON or CSV format
- Pretty-print the results (but not when outputting to TTY)
Installation
cargo install tkrar
Usage
tkrar [OPTIONS] [TARGET]...
Flags
-c
,--case-sensitive
: case sensitivity when counting words--no-stopwords
: ignore stopwords when counting words--alphabetic-only
: ignore non-alphanumeric characters-h
,--help
: print help-V
,--version
: print version
Options
-t
,--top <N>
: show the N most frequent words-m
,--min-char <N>
: ignore words with less than N characters-s
,--sort <SORT>
: sort order (default: desc) (asc or desc)-i
,--ignore-words <REGEX>
: ignore words that match the provided regex pattern-I
,--ignore-files <FILE>
: ignore provided files path-o
,--output-format <FORMAT>
: output with the specified format (default: text) (text, json, csv)-C
,--config <FILE>
: use the specified config file
Arguments
[TARGET]...
: path to the multiple target files or directories (default: stdin)
Configuration file
You can create a configuration file (default: config.toml
) with the following format:
# config.toml
top = 10
min_char = 3
ignore_words = "ignored|hi|hidden"
ignore_files = ["src/ignored.txt", "dummy.txt"]
only those option supported for the config file, you have to specify others in the command line.
Examples
# Count frequency of words in a file
tkrar ./path/to/target
# Count frequency of words from stdin
echo "Hello, world!" | tkrar
# Count frequency of words from multiple files and directories
tkrar ./path/to/file1.txt ./path/to/directory ./path/to/another/directory
# Ignore stopwords
tkrar --no-stopwords ./path/to/target
# Ignore words with provided regex patterns
tkrar --ignore-words "the|and|is|in|to" ./path/to/target
# Ignore provided files path
tkrar --ignore-files "./path/to/file1.txt,./path/to/file2.txt" ./path/to/target
# Ignore non-alphanumeric characters
tkrar --alphabetic-only ./path/to/target
# Output results in JSON or CSV format
tkrar --output-format json ./path/to/target
# Sort order (asc or desc)
tkrar --sort asc ./path/to/target
# Show the N most frequent words
tkrar --top 10 ./path/to/target
Dependencies
~5–7.5MB
~128K SLoC