3 unstable releases

Uses new Rust 2024

new 0.3.0 Apr 11, 2025
0.2.1 Apr 11, 2025
0.2.0 Apr 10, 2025

#194 in Text processing

Download history 199/week @ 2025-04-05

207 downloads per month

MIT license

18KB
353 lines

tkrar

A fast and feature-rich CLI tool written in Rust to count frequency of words in a file or a directory.

Name origin:
The name tkrar (تكرار) comes from the Arabic word for repetition or frequency.
It's pronounced like: tek-raar (with a rolled 'r').

Features

  • Count frequency of words in a file or a directory recursively
  • Process input from stdin
  • Supports case sensitivity
  • ignore stopwords
  • ignoring words with a minimum character count
  • ignoring words with a regex pattern
  • ignoring non-alphanumeric characters
  • ignoring provided files path
  • Supports outputting results in JSON or CSV format
  • Pretty-print the results (but not when outputting to TTY)

Installation

cargo install tkrar

Usage

tkrar [OPTIONS] [TARGET]...

Flags

  • -c, --case-sensitive: case sensitivity when counting words
  • --no-stopwords: ignore stopwords when counting words
  • --alphabetic-only: ignore non-alphanumeric characters
  • -h, --help: print help
  • -V, --version: print version

Options

  • -t, --top <N>: show the N most frequent words
  • -m, --min-char <N>: ignore words with less than N characters
  • -s, --sort <SORT>: sort order (default: desc) (asc or desc)
  • -i, --ignore-words <REGEX>: ignore words that match the provided regex pattern
  • -I, --ignore-files <FILE>: ignore provided files path
  • -o, --output-format <FORMAT>: output with the specified format (default: text) (text, json, csv)
  • -C, --config <FILE>: use the specified config file

Arguments

  • [TARGET]...: path to the multiple target files or directories (default: stdin)

Configuration file

You can create a configuration file (default: config.toml) with the following format:

# config.toml
top = 10

min_char = 3

ignore_words = "ignored|hi|hidden"

ignore_files = ["src/ignored.txt", "dummy.txt"]

only those option supported for the config file, you have to specify others in the command line.

Examples

# Count frequency of words in a file
tkrar ./path/to/target

# Count frequency of words from stdin
echo "Hello, world!" | tkrar

# Count frequency of words from multiple files and directories
tkrar ./path/to/file1.txt ./path/to/directory ./path/to/another/directory

# Ignore stopwords
tkrar --no-stopwords ./path/to/target

# Ignore words with provided regex patterns
tkrar --ignore-words "the|and|is|in|to" ./path/to/target

# Ignore provided files path
tkrar --ignore-files "./path/to/file1.txt,./path/to/file2.txt" ./path/to/target

# Ignore non-alphanumeric characters
tkrar --alphabetic-only ./path/to/target

# Output results in JSON or CSV format
tkrar --output-format json ./path/to/target

# Sort order (asc or desc)
tkrar --sort asc ./path/to/target

# Show the N most frequent words
tkrar --top 10 ./path/to/target

Dependencies

~5–7.5MB
~128K SLoC