#nlp #text #analysis #complexity #duplications

bin+lib textalyzer

Analyze key metrics like number of words, readability, and complexity of any kind of text

3 unstable releases

0.3.0 Mar 11, 2025
0.2.1 Feb 18, 2019
0.2.0 Feb 18, 2019

#340 in Text processing

Download history 1/week @ 2024-12-15 9/week @ 2025-02-09 4/week @ 2025-02-16 94/week @ 2025-03-09 17/week @ 2025-03-16 5/week @ 2025-03-23

116 downloads per month

AGPL-3.0-or-later

43KB
955 lines

Textalyzer

Analyze key metrics like number of words, readability, complexity, etc. of any kind of text.

Usage

# Word frequency histogram
textalyzer histogram <filepath>

# Find duplicated code blocks (default: minimum 3 non-empty lines)
textalyzer duplication <path> [<additional paths...>]

# Find duplications with at least 5 non-empty lines
textalyzer duplication --min-lines=5 <path> [<additional paths...>]

# Include single-line duplications
textalyzer duplication --min-lines=1 <path> [<additional paths...>]

The duplication command analyzes files for duplicated text blocks. It can:

  • Analyze multiple files or recursively scan directories
  • Filter duplications based on minimum number of non-empty lines with --min-lines=N (default: 2)
  • Detect single-line duplications when using --min-lines=1
  • Rank duplications by number of consecutive lines
  • Show all occurrences with file and line references
  • Utilize multithreaded processing for optimal performance on all available CPU cores
  • Use memory mapping for efficient processing of large files with minimal memory overhead

Dependencies

~9–20MB
~296K SLoC