3 releases
| 0.1.2 | Jul 24, 2025 |
|---|---|
| 0.1.1 | Jul 24, 2025 |
| 0.1.0 | Jul 24, 2025 |
#521 in Profiling
6MB
20K
SLoC
TurboProp
TurboProp (tp) is a fast semantic code search and indexing tool written in Rust. It uses machine learning embeddings to enable intelligent code search across your codebase, making it easy to find relevant code snippets based on natural language queries.
Key Features
- Semantic Search: Find code by meaning, not just keywords
- Git Integration: Respects
.gitignoreand only indexes files under source control - Watch Mode: Automatically updates the index when files change
- File Type Filtering: Search within specific file types
- Multiple Output Formats: JSON for tools, human-readable text for reading
- Performance Optimized: Handles codebases from 50 to 10,000+ files
- Easy Configuration: Optional
.turboprop.ymlconfiguration file - MCP Server Integration: Built-in MCP server for coding agents like Claude Code, Cursor, and Windsurf
MCP Server for Coding Agents
What is MCP? MCP (Model Context Protocol) is a standard way for AI coding agents to access external tools. Think of it as a bridge that lets your AI assistant search through your code in real-time.
Before MCP: "Find JWT authentication code" → Agent can only see files you've shared
With MCP: "Find JWT authentication code" → Agent searches your entire codebase semantically
TurboProp's MCP server works like a librarian for your codebase - it catalogs all your code, keeps it up-to-date, and helps agents find relevant code instantly.
Quick Start (< 2 minutes)
-
Start the MCP server:
tp mcp --repo . -
Configure your coding agent (see integration examples below)
-
Ask your agent: "Find the JWT authentication implementation"
That's it! Your agent can now search your entire codebase semantically.
Agent Integration
Claude Code - Add to .claude.json in your project:
{
"mcpServers": {
"turboprop": {
"command": "tp",
"args": ["mcp", "--repo", "."]
}
}
}
Cursor - Add to .cursor/mcp.json in your project:
{
"mcpServers": {
"turboprop": {
"command": "tp",
"args": ["mcp", "--repo", "."],
"cwd": "."
}
}
}
Other Agents (GitHub Copilot, Windsurf, etc.) - Use these settings:
- Command:
tp - Arguments:
["mcp", "--repo", "."]
✓ Verify Setup: Restart your agent and ask: "Search for error handling code"
What You Can Ask Your Agent
Once configured, you can ask natural language questions like:
- "Find the JWT authentication implementation" - Locates authentication code
- "Show me error handling patterns" - Finds error handling across the codebase
- "Where is database connection logic?" - Discovers database-related code
- "Find all tests for user login" - Locates relevant test files
- "How does the API rate limiting work?" - Finds rate limiting implementation
Advanced Search Options
Your agent can also use these parameters to refine searches:
limit: Maximum results (default: 10)filetype: Filter by extension (.rs,.js,.py)filter: Glob pattern (src/**/*.rs,tests/**)threshold: Similarity threshold (0.0-1.0)
Example: "Find authentication code, limit to 5 results, only in Rust files"
Configuration & Advanced Usage
Custom Model & Settings:
tp mcp --repo . --model sentence-transformers/all-MiniLM-L12-v2 --max-filesize 5mb
Project Configuration (.turboprop.yml):
model: "sentence-transformers/all-MiniLM-L6-v2"
max_filesize: "2mb"
similarity_threshold: 0.3
📖 Complete Guide: MCP User Guide
🔧 Troubleshooting: Common Issues & Solutions
⚡ Performance: Tips for large repositories and team usage
Quick Start
Installation
Via Cargo (Recommended)
cargo install turboprop
From Source
git clone https://github.com/glamp/turboprop-rust
cd turboprop-rust
cargo build --release
# Binary will be in target/release/tp
Basic Usage
-
Index your codebase:
tp index --repo . --max-filesize 2mb -
Search for code:
tp search "jwt authentication" --repo . -
Filter by file type:
tp search --filetype .js "jwt authentication" --repo . -
Get human-readable output:
tp search "jwt authentication" --repo . --output text
Model Support
TurboProp now supports multiple embedding models to optimize for different use cases:
Available Models
Sentence Transformer Models (FastEmbed)
-
sentence-transformers/all-MiniLM-L6-v2(default)- Fast and lightweight, good for general use
- 384 dimensions, ~23MB
- Automatic download and caching
-
sentence-transformers/all-MiniLM-L12-v2- Better accuracy with slightly more compute
- 384 dimensions, ~44MB
Specialized Code Models
nomic-embed-code.Q5_K_S.gguf- Specialized for code search and retrieval
- 768 dimensions, ~2.5GB
- Supports multiple programming languages
- Quantized for efficient inference
Multilingual Models
Qwen/Qwen3-Embedding-0.6B- State-of-the-art multilingual support (100+ languages)
- 1024 dimensions, ~600MB
- Supports instruction-based embeddings
- Excellent for code and text retrieval
Model Selection Guide
Choose your model based on your use case:
| Use Case | Recommended Model | Why |
|---|---|---|
| General code search | sentence-transformers/all-MiniLM-L6-v2 |
Fast, reliable, good balance |
| Specialized code search | nomic-embed-code.Q5_K_S.gguf |
Optimized for code understanding |
| Multilingual projects | Qwen/Qwen3-Embedding-0.6B |
Best multilingual support |
| Low resource environments | sentence-transformers/all-MiniLM-L6-v2 |
Smallest memory footprint |
| Maximum accuracy | Qwen/Qwen3-Embedding-0.6B |
State-of-the-art performance |
Usage Examples
Basic Model Selection
# List available models
tp model list
# Get model information
tp model info "Qwen/Qwen3-Embedding-0.6B"
# Download a model before use
tp model download "nomic-embed-code.Q5_K_S.gguf"
Indexing with Different Models
# Use default model
tp index --repo ./my-project
# Use specialized code model
tp index --repo ./my-project --model "nomic-embed-code.Q5_K_S.gguf"
# Use multilingual model with instruction
tp index --repo ./my-project \
--model "Qwen/Qwen3-Embedding-0.6B" \
--instruction "Represent this code for semantic search"
Searching with Model Consistency
# Search using the same model used for indexing
tp search "jwt authentication" --model "nomic-embed-code.Q5_K_S.gguf"
# Use instruction for context-aware search (Qwen3 only)
tp search "error handling" \
--model "Qwen/Qwen3-Embedding-0.6B" \
--instruction "Find code related to error handling and exceptions"
Configuration File Support
Create .turboprop.yml in your project root:
# Default model for all operations
default_model: "sentence-transformers/all-MiniLM-L6-v2"
# Model-specific configurations
models:
"Qwen/Qwen3-Embedding-0.6B":
instruction: "Represent this code for semantic search"
cache_dir: "~/.turboprop/qwen3-cache"
"nomic-embed-code.Q5_K_S.gguf":
cache_dir: "~/.turboprop/nomic-cache"
# Performance settings
embedding:
batch_size: 32
cache_embeddings: true
# Resource limits
max_memory_usage: "8GB"
warn_large_models: true
Complete Usage Guide
Indexing Command
The index command creates a searchable index of your codebase:
tp index [OPTIONS] --repo <REPO>
Options:
--repo <PATH>: Repository path to index (default: current directory)--max-filesize <SIZE>: Maximum file size to index (e.g., "2mb", "500kb", "1gb")--watch: Monitor file changes and update index automatically--model <MODEL>: Embedding model to use (default: "sentence-transformers/all-MiniLM-L6-v2")--cache-dir <DIR>: Cache directory for models and data--worker-threads <N>: Number of worker threads for processing--batch-size <N>: Batch size for embedding generation (default: 32)--verbose: Enable verbose output
Examples:
# Basic indexing
tp index --repo .
# Index with size limit and watch mode
tp index --repo . --max-filesize 2mb --watch
# Use custom model and cache directory
tp index --repo . --model "sentence-transformers/all-MiniLM-L12-v2" --cache-dir ~/.turboprop-cache
# Index with custom performance settings
tp index --repo . --worker-threads 8 --batch-size 64
Search Command
The search command finds relevant code using semantic similarity:
tp search <QUERY> [OPTIONS]
Options:
<QUERY>: Search query (natural language or keywords)--repo <PATH>: Repository path to search in (default: current directory)--limit <N>: Maximum number of results to return (default: 10)--threshold <FLOAT>: Minimum similarity threshold (0.0 to 1.0)--output <FORMAT>: Output format: 'json' (default) or 'text'--filetype <EXT>: Filter results by file extension (e.g., '.rs', '.js', '.py')--filter <PATTERN>: Filter results by glob pattern (e.g., '.rs', 'src/**/.js')
Examples:
# Basic search
tp search "user authentication" --repo .
# Search with filters and limits
tp search "database connection" --repo . --filetype .rs --limit 5
# Get human-readable output
tp search "error handling" --repo . --output text
# High-precision search
tp search "jwt token validation" --repo . --threshold 0.8
# Search in specific directory
tp search "api routes" --repo ./backend
# Filter by glob pattern
tp search "authentication" --repo . --filter "src/*.js"
# Recursive glob patterns
tp search "error handling" --repo . --filter "**/*.{rs,py}"
# Combine filters
tp search "database" --repo . --filetype .rs --filter "src/**/*.rs"
Glob Pattern Filtering
TurboProp supports powerful glob pattern filtering to search within specific files or directories. Glob patterns use Unix shell-style wildcards to match file paths.
Basic Wildcards
| Wildcard | Description | Example |
|---|---|---|
* |
Match any characters within a directory | *.rs matches all Rust files |
? |
Match exactly one character | file?.rs matches file1.rs, fileA.rs |
** |
Match any characters across directories | **/*.js matches JS files anywhere |
[abc] |
Match any character in the set | file[123].rs matches file1.rs, file2.rs, file3.rs |
[!abc] |
Match any character NOT in the set | file[!0-9].rs matches filea.rs but not file1.rs |
{a,b} |
Match any of the alternatives | *.{js,ts} matches both .js and .ts files |
Common Pattern Examples
File Type Filtering
# All Rust files anywhere in the codebase
tp search "async function" --filter "*.rs"
# All JavaScript and TypeScript files
tp search "react component" --filter "*.{js,ts,jsx,tsx}"
# All configuration files
tp search "database" --filter "*.{json,yaml,yml,toml,ini}"
Directory-Specific Filtering
# Files only in the src directory
tp search "main function" --filter "src/*.rs"
# Files only in tests directory
tp search "test case" --filter "tests/*.py"
# Files in specific subdirectories
tp search "handler" --filter "src/api/*.js"
Recursive Directory Filtering
# Python files anywhere in the project
tp search "authentication" --filter "**/*.py"
# Test files in any subdirectory
tp search "unit test" --filter "**/test_*.rs"
# Source files in src and all subdirectories
tp search "database connection" --filter "src/**/*.{rs,py,js}"
# Handler files in nested API directories
tp search "request handler" --filter "**/api/**/handlers/*.rs"
Advanced Pattern Examples
# Test files with specific naming patterns
tp search "integration test" --filter "tests/**/*_{test,spec}.{js,ts}"
# Source files excluding certain directories
tp search "function definition" --filter "src/**/*.rs" --filter "!**/target/**"
# Files in multiple specific directories
tp search "configuration" --filter "{src,config,scripts}/**/*.{json,yaml}"
# Files with numeric suffixes
tp search "version" --filter "**/*[0-9].{js,py,rs}"
Pattern Behavior
Path Matching: Patterns match against the entire file path, not just the filename:
*.rsmatchesmain.rs,src/main.rs, andlib/nested/file.rssrc/*.rsmatchessrc/main.rsbut notsrc/nested/file.rssrc/**/*.rsmatches bothsrc/main.rsandsrc/nested/file.rs
Case Sensitivity: Patterns are case-sensitive by default:
*.RSmatchesFILE.RSbut notfile.rs*.rsmatchesfile.rsbut notFILE.RS
Path Separators: Always use forward slashes (/) in patterns:
- ✅
src/api/*.js(correct) - ❌
src\\api\\*.js(incorrect)
Combining with File Type Filter: You can use both --filter and --filetype together:
# Search for Rust files in src directory only
tp search "async" --filetype .rs --filter "src/**/*"
Performance Tips
- Simple patterns are faster:
*.rsis faster than**/*.rs - Be specific when possible:
src/*.jsis faster than**/*.jsif you know files are insrc/ - Avoid excessive wildcards: Patterns with many
**can be slower on large codebases - Use file type filter for extensions:
--filetype .rsis optimized compared to--filter "*.rs"
Troubleshooting Glob Patterns
Pattern doesn't match expected files:
- Check case sensitivity:
*.RSvs*.rs - Verify path structure:
src/*.jsonly matches direct children ofsrc/ - Use
**for recursive matching:src/**/*.jsmatches nested files
Pattern matching too many files:
- Be more specific: use
src/*.jsinstead of*.js - Add more path components:
src/components/*.jsx - Use character classes:
test_[0-9]*.rsinstead oftest_*.rs
Complex patterns not working:
- Test simpler patterns first: start with
*.extthen add complexity - Check for typos in braces:
{js,ts}not{js, ts}(no spaces) - Validate bracket expressions:
[a-z]not[a-Z]
For more pattern examples and troubleshooting, see the TROUBLESHOOTING.md file.
Configuration
TurboProp supports optional configuration via a .turboprop.yml file in your repository root:
# .turboprop.yml
max_filesize: "2mb"
model: "sentence-transformers/all-MiniLM-L6-v2"
cache_dir: "~/.turboprop-cache"
worker_threads: 4
batch_size: 32
default_output: "json"
similarity_threshold: 0.3
Output Formats
JSON Output (Default)
{
"file": "src/auth.rs",
"score": 0.8234,
"content": "fn authenticate_user(token: &str) -> Result<User, AuthError> { ... }"
}
Text Output
Score: 0.82 | src/auth.rs
fn authenticate_user(token: &str) -> Result<User, AuthError> {
// JWT token validation logic
...
}
Performance Characteristics
- Indexing Speed: ~100-500 files/second (depending on file size and hardware)
- Search Speed: ~10-50ms per query (after initial model loading)
- Memory Usage: ~50-200MB (varies with model and index size)
- Storage: Index size is typically 10-30% of source code size
Recommended Limits
- File Count: Up to 10,000 files (tested)
- File Size: Up to 2MB per file (configurable)
- Total Codebase: Up to 500MB of source code
Supported File Types
TurboProp works with any text-based file but is optimized for common programming languages:
- Web:
.js,.ts,.jsx,.tsx,.html,.css,.scss,.vue - Backend:
.py,.rs,.go,.java,.kt,.scala,.rb,.php - Systems:
.c,.cpp,.h,.hpp,.cs,.swift - Data:
.sql,.json,.yaml,.yml,.xml,.toml - Docs:
.md,.txt,.rst - Config:
.env,.ini,.conf,.cfg
Integration Examples
With Git Hooks
Add to .git/hooks/post-commit:
#!/bin/bash
tp index --repo . --max-filesize 2mb
With IDEs
Many IDEs can be configured to run external tools. Add TurboProp as a custom search tool.
With CI/CD
# In your CI script
tp index --repo . --max-filesize 2mb
tp search "security vulnerability" --repo . --output json > security-search-results.json
Troubleshooting
Common Issues
Index not found
Error: No index found in repository
Solution: Run tp index --repo . first to create an index.
Model download fails
Error: Failed to download model
Solution: Check internet connection or specify a local cache directory with --cache-dir.
Large files skipped
Warning: Skipping large file (>2MB)
Solution: Increase limit with --max-filesize 5mb or exclude large files.
Out of memory
Error: Out of memory during indexing
Solution: Reduce --batch-size or --worker-threads, or exclude large files.
Getting Help
tp --help # General help
tp index --help # Index command help
tp search --help # Search command help
Development
Building from Source
git clone https://github.com/glamp/turboprop-rust
cd turboprop-rust
cargo build --release
Running Tests
cargo test # Run all tests
cargo test --test integration # Run integration tests only
cargo bench # Run benchmarks
Dependencies
- clap: CLI parsing and help generation
- tokio: Async runtime for I/O operations
- serde: JSON serialization
- fastembed: Machine learning embeddings
- git2: Git repository integration
- notify: File system watching
- walkdir: Directory traversal
See Also
For more detailed information:
- Installation Guide - Comprehensive installation instructions for all platforms
- Model Documentation - Complete guide to available embedding models and selection criteria
- Configuration Guide - Advanced configuration options and
.turboprop.ymlsetup - API Reference - Library API documentation for programmatic usage
- Troubleshooting Guide - Solutions to common issues and performance problems
- Migration Guide - Upgrading from previous versions
Contributing
- Fork the repository
- Create a feature branch
- Add tests for your changes
- Ensure all tests pass:
cargo test - Submit a pull request
License
Licensed under either of:
- MIT License (LICENSE-MIT)
- Apache License, Version 2.0 (LICENSE-APACHE)
at your option.
Dependencies
~78MB
~1.5M SLoC