3 releases (stable)
| 1.2.0 | Sep 11, 2025 |
|---|---|
| 1.1.0 | Sep 11, 2025 |
| 0.1.0 | Sep 11, 2025 |
#588 in Command line utilities
110 downloads per month
Used in howmany
315KB
5.5K
SLoC
SherlockIO
Fast and accurate language detection CLI tool that analyzes your codebase while automatically filtering out dependencies, build artifacts, and cache files.
How It Works
Sherlock uses a two-stage intelligent filtering system:
-
Smart Filtering: Automatically skips non-source files using sophisticated pattern matching
- Dependencies:
node_modules/,venv/,vendor/,target/ - Build artifacts:
dist/,build/,__pycache__/,.classfiles - IDE/Editor files:
.vscode/,.idea/,.git/ - 25+ language ecosystems with deep knowledge of their tooling
- Dependencies:
-
Advanced Language Detection: Identifies programming languages using sophisticated multi-stage analysis
- Extension-based detection with conflict resolution (
.xml→ XML vs Maven) - Content analysis with advanced pattern scoring (shebangs, syntax patterns, keywords)
- Special files detection (
Dockerfile,Makefile,package.json) - Smart disambiguation for ambiguous files using content patterns
- 100+ supported languages with 98%* accuracy
- Extension-based detection with conflict resolution (
Result: Analyze only your actual source code, not the noise!
Installation
# Install from crates.io (recommended)
cargo install sherlock-io
# Or build from source
cargo build --release
cargo install --path .
Usage
# Analyze current directory
sherlock
# Analyze specific directory
sherlock /path/to/project
# Set depth limit
sherlock -d 5 /path/to/project
# Output formats
sherlock --format table # default
sherlock --format json
sherlock --format csv
# Options
sherlock --verbose # detailed output
sherlock --include-hidden # include hidden files
sherlock --min-percentage 1.0 # minimum threshold
Example Output
Language Detection Report
Total Files: 127 (filtered from 50,000+ files)
Total Size: 2.3 MB
Language Files Percentage Size Bar
──────────────────────────────────────────────────────────────────────
Rust 45 35.4% 1.2 MB ███████████████░░░░░ 35.4%
JavaScript 32 25.2% 654.2 KB ██████████░░░░░░░░░░ 25.2%
TypeScript 18 14.2% 423.1 KB █████░░░░░░░░░░░░░░░ 14.2%
JSON 12 9.4% 89.4 KB ███░░░░░░░░░░░░░░░░░ 9.4%
Markdown 8 6.3% 45.2 KB ██░░░░░░░░░░░░░░░░░░ 6.3%
Notice: Only source files analyzed - dependencies, build artifacts, and cache files automatically filtered out!
What Gets Filtered Out
SherlockIO automatically ignores these non-source files:
Dependencies & Packages
node_modules/,venv/,vendor/,target/,deps/__pycache__/,.pytest_cache/,.gradle/,.m2/
Build Artifacts
dist/,build/,out/,bin/,*.class,*.o,*.so.next/,.nuxt/,coverage/,*.min.js
IDE & Editor Files
.vscode/,.idea/,.git/,.DS_Store- Lock files:
package-lock.json,yarn.lock,Cargo.lock
Supported Languages (100+)
- Programming: Rust, Python, JavaScript, TypeScript, Go, Java, C/C++, C#, PHP, Ruby, Swift, Kotlin, Scala, Haskell, Elixir, Clojure, and more
- Web: HTML, CSS, SCSS, Vue, React (JSX/TSX), Svelte
- Data: JSON, YAML, TOML, XML, SQL, GraphQL
- Config: Dockerfile, Makefile, CMake, Gradle
- Documentation: Markdown, reStructuredText, AsciiDoc
What's New in v1.2.0
🚀 Major Detection Improvements
- Fixed XML Detection Bug: XML files are no longer incorrectly filtered out
- Advanced Conflict Resolution: Smart disambiguation between similar file types (XML vs Maven POM files)
- Improved Pattern Scoring: Sophisticated algorithm considering pattern rarity, specificity, and context
- Enhanced Content Analysis: Better detection of languages with shared syntax patterns
- 98%+ Accuracy: Comprehensive testing across all supported languages
🔧 Technical Enhancements
- Multi-language extension support (one extension can map to multiple languages)
- Content-based disambiguation for ambiguous file extensions
- Advanced pattern scoring with keyword specificity bonuses
- Improved shebang detection and special file handling
License
MIT License
Dependencies
~6–17MB
~208K SLoC