3 unstable releases
new 0.3.0 | Mar 11, 2025 |
---|---|
0.2.1 | Feb 18, 2019 |
0.2.0 | Feb 18, 2019 |
#338 in Text processing
51 downloads per month
43KB
955 lines
Textalyzer
Analyze key metrics like number of words, readability, complexity, etc. of any kind of text.
Usage
# Word frequency histogram
textalyzer histogram <filepath>
# Find duplicated code blocks (default: minimum 3 non-empty lines)
textalyzer duplication <path> [<additional paths...>]
# Find duplications with at least 5 non-empty lines
textalyzer duplication --min-lines=5 <path> [<additional paths...>]
# Include single-line duplications
textalyzer duplication --min-lines=1 <path> [<additional paths...>]
The duplication command analyzes files for duplicated text blocks. It can:
- Analyze multiple files or recursively scan directories
- Filter duplications based on minimum number of non-empty lines with
--min-lines=N
(default: 2) - Detect single-line duplications when using
--min-lines=1
- Rank duplications by number of consecutive lines
- Show all occurrences with file and line references
- Utilize multithreaded processing for optimal performance on all available CPU cores
- Use memory mapping for efficient processing of large files with minimal memory overhead
Dependencies
~9–20MB
~299K SLoC