3 unstable releases

Uses new Rust 2024

0.3.0	Apr 11, 2025
0.2.1	Apr 11, 2025
0.2.0	Apr 10, 2025

#358 in Text processing

30 downloads per month

MIT license

18KB
353 lines

tkrar

A fast and feature-rich CLI tool written in Rust to count frequency of words in a file or a directory.

Name origin:
The name tkrar (تكرار) comes from the Arabic word for repetition or frequency.
It's pronounced like: tek-raar (with a rolled 'r').

Features

Count frequency of words in a file or a directory recursively
Process input from stdin
Supports case sensitivity
ignore stopwords
ignoring words with a minimum character count
ignoring words with a regex pattern
ignoring non-alphanumeric characters
ignoring provided files path
Supports outputting results in JSON or CSV format
Pretty-print the results (but not when outputting to TTY)

Installation

cargo install tkrar

Usage

tkrar [OPTIONS] [TARGET]...

Flags

-c, --case-sensitive: case sensitivity when counting words
--no-stopwords: ignore stopwords when counting words
--alphabetic-only: ignore non-alphanumeric characters
-h, --help: print help
-V, --version: print version

Options

-t, --top <N>: show the N most frequent words
-m, --min-char <N>: ignore words with less than N characters
-s, --sort <SORT>: sort order (default: desc) (asc or desc)
-i, --ignore-words <REGEX>: ignore words that match the provided regex pattern
-I, --ignore-files <FILE>: ignore provided files path
-o, --output-format <FORMAT>: output with the specified format (default: text) (text, json, csv)
-C, --config <FILE>: use the specified config file

Arguments

[TARGET]...: path to the multiple target files or directories (default: stdin)

Configuration file

You can create a configuration file (default: config.toml) with the following format:

# config.toml
top = 10

min_char = 3

ignore_words = "ignored|hi|hidden"

ignore_files = ["src/ignored.txt", "dummy.txt"]

only those option supported for the config file, you have to specify others in the command line.

Examples

# Count frequency of words in a file
tkrar ./path/to/target

# Count frequency of words from stdin
echo "Hello, world!" | tkrar

# Count frequency of words from multiple files and directories
tkrar ./path/to/file1.txt ./path/to/directory ./path/to/another/directory

# Ignore stopwords
tkrar --no-stopwords ./path/to/target

# Ignore words with provided regex patterns
tkrar --ignore-words "the|and|is|in|to" ./path/to/target

# Ignore provided files path
tkrar --ignore-files "./path/to/file1.txt,./path/to/file2.txt" ./path/to/target

# Ignore non-alphanumeric characters
tkrar --alphabetic-only ./path/to/target

# Output results in JSON or CSV format
tkrar --output-format json ./path/to/target

# Sort order (asc or desc)
tkrar --sort asc ./path/to/target

# Show the N most frequent words
tkrar --top 10 ./path/to/target

Dependencies

~5–7.5MB
~127K SLoC