#text #analysis #numbers #complexity #readability #analyze #metrics

bin+lib textalyzer

Analyze key metrics like number of words, readability, and complexity of any kind of text

3 unstable releases

new 0.3.0 Mar 11, 2025
0.2.1 Feb 18, 2019
0.2.0 Feb 18, 2019

#338 in Text processing

Download history 2/week @ 2024-12-04 6/week @ 2024-12-11 13/week @ 2025-02-12 38/week @ 2025-03-05

51 downloads per month

AGPL-3.0-or-later

43KB
955 lines

Textalyzer

Analyze key metrics like number of words, readability, complexity, etc. of any kind of text.

Usage

# Word frequency histogram
textalyzer histogram <filepath>

# Find duplicated code blocks (default: minimum 3 non-empty lines)
textalyzer duplication <path> [<additional paths...>]

# Find duplications with at least 5 non-empty lines
textalyzer duplication --min-lines=5 <path> [<additional paths...>]

# Include single-line duplications
textalyzer duplication --min-lines=1 <path> [<additional paths...>]

The duplication command analyzes files for duplicated text blocks. It can:

  • Analyze multiple files or recursively scan directories
  • Filter duplications based on minimum number of non-empty lines with --min-lines=N (default: 2)
  • Detect single-line duplications when using --min-lines=1
  • Rank duplications by number of consecutive lines
  • Show all occurrences with file and line references
  • Utilize multithreaded processing for optimal performance on all available CPU cores
  • Use memory mapping for efficient processing of large files with minimal memory overhead

Dependencies

~9–20MB
~299K SLoC