3 releases

0.1.2 Jul 24, 2025
0.1.1 Jul 24, 2025
0.1.0 Jul 24, 2025

#521 in Profiling

MIT/Apache

6MB
20K SLoC

TurboProp

Crates.io Documentation License

TurboProp (tp) is a fast semantic code search and indexing tool written in Rust. It uses machine learning embeddings to enable intelligent code search across your codebase, making it easy to find relevant code snippets based on natural language queries.

Key Features

  • Semantic Search: Find code by meaning, not just keywords
  • Git Integration: Respects .gitignore and only indexes files under source control
  • Watch Mode: Automatically updates the index when files change
  • File Type Filtering: Search within specific file types
  • Multiple Output Formats: JSON for tools, human-readable text for reading
  • Performance Optimized: Handles codebases from 50 to 10,000+ files
  • Easy Configuration: Optional .turboprop.yml configuration file
  • MCP Server Integration: Built-in MCP server for coding agents like Claude Code, Cursor, and Windsurf

MCP Server for Coding Agents

What is MCP? MCP (Model Context Protocol) is a standard way for AI coding agents to access external tools. Think of it as a bridge that lets your AI assistant search through your code in real-time.

Before MCP: "Find JWT authentication code" → Agent can only see files you've shared
With MCP: "Find JWT authentication code" → Agent searches your entire codebase semantically

TurboProp's MCP server works like a librarian for your codebase - it catalogs all your code, keeps it up-to-date, and helps agents find relevant code instantly.

Quick Start (< 2 minutes)

  1. Start the MCP server:

    tp mcp --repo .
    
  2. Configure your coding agent (see integration examples below)

  3. Ask your agent: "Find the JWT authentication implementation"

That's it! Your agent can now search your entire codebase semantically.

Agent Integration

Claude Code - Add to .claude.json in your project:

{
  "mcpServers": {
    "turboprop": {
      "command": "tp",
      "args": ["mcp", "--repo", "."]
    }
  }
}

Cursor - Add to .cursor/mcp.json in your project:

{
  "mcpServers": {
    "turboprop": {
      "command": "tp", 
      "args": ["mcp", "--repo", "."],
      "cwd": "."
    }
  }
}

Other Agents (GitHub Copilot, Windsurf, etc.) - Use these settings:

  • Command: tp
  • Arguments: ["mcp", "--repo", "."]

✓ Verify Setup: Restart your agent and ask: "Search for error handling code"

What You Can Ask Your Agent

Once configured, you can ask natural language questions like:

  • "Find the JWT authentication implementation" - Locates authentication code
  • "Show me error handling patterns" - Finds error handling across the codebase
  • "Where is database connection logic?" - Discovers database-related code
  • "Find all tests for user login" - Locates relevant test files
  • "How does the API rate limiting work?" - Finds rate limiting implementation

Advanced Search Options

Your agent can also use these parameters to refine searches:

  • limit: Maximum results (default: 10)
  • filetype: Filter by extension (.rs, .js, .py)
  • filter: Glob pattern (src/**/*.rs, tests/**)
  • threshold: Similarity threshold (0.0-1.0)

Example: "Find authentication code, limit to 5 results, only in Rust files"

Configuration & Advanced Usage

Custom Model & Settings:

tp mcp --repo . --model sentence-transformers/all-MiniLM-L12-v2 --max-filesize 5mb

Project Configuration (.turboprop.yml):

model: "sentence-transformers/all-MiniLM-L6-v2"
max_filesize: "2mb" 
similarity_threshold: 0.3

📖 Complete Guide: MCP User Guide
🔧 Troubleshooting: Common Issues & Solutions
⚡ Performance: Tips for large repositories and team usage

Quick Start

Installation

cargo install turboprop

From Source

git clone https://github.com/glamp/turboprop-rust
cd turboprop-rust
cargo build --release
# Binary will be in target/release/tp

Basic Usage

  1. Index your codebase:

    tp index --repo . --max-filesize 2mb
    
  2. Search for code:

    tp search "jwt authentication" --repo .
    
  3. Filter by file type:

    tp search --filetype .js "jwt authentication" --repo .
    
  4. Get human-readable output:

    tp search "jwt authentication" --repo . --output text
    

Model Support

TurboProp now supports multiple embedding models to optimize for different use cases:

Available Models

Sentence Transformer Models (FastEmbed)

  • sentence-transformers/all-MiniLM-L6-v2 (default)

    • Fast and lightweight, good for general use
    • 384 dimensions, ~23MB
    • Automatic download and caching
  • sentence-transformers/all-MiniLM-L12-v2

    • Better accuracy with slightly more compute
    • 384 dimensions, ~44MB

Specialized Code Models

  • nomic-embed-code.Q5_K_S.gguf
    • Specialized for code search and retrieval
    • 768 dimensions, ~2.5GB
    • Supports multiple programming languages
    • Quantized for efficient inference

Multilingual Models

  • Qwen/Qwen3-Embedding-0.6B
    • State-of-the-art multilingual support (100+ languages)
    • 1024 dimensions, ~600MB
    • Supports instruction-based embeddings
    • Excellent for code and text retrieval

Model Selection Guide

Choose your model based on your use case:

Use Case Recommended Model Why
General code search sentence-transformers/all-MiniLM-L6-v2 Fast, reliable, good balance
Specialized code search nomic-embed-code.Q5_K_S.gguf Optimized for code understanding
Multilingual projects Qwen/Qwen3-Embedding-0.6B Best multilingual support
Low resource environments sentence-transformers/all-MiniLM-L6-v2 Smallest memory footprint
Maximum accuracy Qwen/Qwen3-Embedding-0.6B State-of-the-art performance

Usage Examples

Basic Model Selection

# List available models
tp model list

# Get model information
tp model info "Qwen/Qwen3-Embedding-0.6B"

# Download a model before use
tp model download "nomic-embed-code.Q5_K_S.gguf"

Indexing with Different Models

# Use default model
tp index --repo ./my-project

# Use specialized code model
tp index --repo ./my-project --model "nomic-embed-code.Q5_K_S.gguf"

# Use multilingual model with instruction
tp index --repo ./my-project \
  --model "Qwen/Qwen3-Embedding-0.6B" \
  --instruction "Represent this code for semantic search"

Searching with Model Consistency

# Search using the same model used for indexing
tp search "jwt authentication" --model "nomic-embed-code.Q5_K_S.gguf"

# Use instruction for context-aware search (Qwen3 only)
tp search "error handling" \
  --model "Qwen/Qwen3-Embedding-0.6B" \
  --instruction "Find code related to error handling and exceptions"

Configuration File Support

Create .turboprop.yml in your project root:

# Default model for all operations
default_model: "sentence-transformers/all-MiniLM-L6-v2"

# Model-specific configurations
models:
  "Qwen/Qwen3-Embedding-0.6B":
    instruction: "Represent this code for semantic search"
    cache_dir: "~/.turboprop/qwen3-cache"
  
  "nomic-embed-code.Q5_K_S.gguf":
    cache_dir: "~/.turboprop/nomic-cache"

# Performance settings
embedding:
  batch_size: 32
  cache_embeddings: true
  
# Resource limits
max_memory_usage: "8GB"
warn_large_models: true

Complete Usage Guide

Indexing Command

The index command creates a searchable index of your codebase:

tp index [OPTIONS] --repo <REPO>

Options:

  • --repo <PATH>: Repository path to index (default: current directory)
  • --max-filesize <SIZE>: Maximum file size to index (e.g., "2mb", "500kb", "1gb")
  • --watch: Monitor file changes and update index automatically
  • --model <MODEL>: Embedding model to use (default: "sentence-transformers/all-MiniLM-L6-v2")
  • --cache-dir <DIR>: Cache directory for models and data
  • --worker-threads <N>: Number of worker threads for processing
  • --batch-size <N>: Batch size for embedding generation (default: 32)
  • --verbose: Enable verbose output

Examples:

# Basic indexing
tp index --repo .

# Index with size limit and watch mode
tp index --repo . --max-filesize 2mb --watch

# Use custom model and cache directory
tp index --repo . --model "sentence-transformers/all-MiniLM-L12-v2" --cache-dir ~/.turboprop-cache

# Index with custom performance settings
tp index --repo . --worker-threads 8 --batch-size 64

Search Command

The search command finds relevant code using semantic similarity:

tp search <QUERY> [OPTIONS]

Options:

  • <QUERY>: Search query (natural language or keywords)
  • --repo <PATH>: Repository path to search in (default: current directory)
  • --limit <N>: Maximum number of results to return (default: 10)
  • --threshold <FLOAT>: Minimum similarity threshold (0.0 to 1.0)
  • --output <FORMAT>: Output format: 'json' (default) or 'text'
  • --filetype <EXT>: Filter results by file extension (e.g., '.rs', '.js', '.py')
  • --filter <PATTERN>: Filter results by glob pattern (e.g., '.rs', 'src/**/.js')

Examples:

# Basic search
tp search "user authentication" --repo .

# Search with filters and limits
tp search "database connection" --repo . --filetype .rs --limit 5

# Get human-readable output
tp search "error handling" --repo . --output text

# High-precision search
tp search "jwt token validation" --repo . --threshold 0.8

# Search in specific directory
tp search "api routes" --repo ./backend

# Filter by glob pattern
tp search "authentication" --repo . --filter "src/*.js"

# Recursive glob patterns
tp search "error handling" --repo . --filter "**/*.{rs,py}"

# Combine filters
tp search "database" --repo . --filetype .rs --filter "src/**/*.rs"

Glob Pattern Filtering

TurboProp supports powerful glob pattern filtering to search within specific files or directories. Glob patterns use Unix shell-style wildcards to match file paths.

Basic Wildcards

Wildcard Description Example
* Match any characters within a directory *.rs matches all Rust files
? Match exactly one character file?.rs matches file1.rs, fileA.rs
** Match any characters across directories **/*.js matches JS files anywhere
[abc] Match any character in the set file[123].rs matches file1.rs, file2.rs, file3.rs
[!abc] Match any character NOT in the set file[!0-9].rs matches filea.rs but not file1.rs
{a,b} Match any of the alternatives *.{js,ts} matches both .js and .ts files

Common Pattern Examples

File Type Filtering

# All Rust files anywhere in the codebase
tp search "async function" --filter "*.rs"

# All JavaScript and TypeScript files
tp search "react component" --filter "*.{js,ts,jsx,tsx}"

# All configuration files
tp search "database" --filter "*.{json,yaml,yml,toml,ini}"

Directory-Specific Filtering

# Files only in the src directory
tp search "main function" --filter "src/*.rs"

# Files only in tests directory
tp search "test case" --filter "tests/*.py"

# Files in specific subdirectories
tp search "handler" --filter "src/api/*.js"

Recursive Directory Filtering

# Python files anywhere in the project
tp search "authentication" --filter "**/*.py"

# Test files in any subdirectory
tp search "unit test" --filter "**/test_*.rs"

# Source files in src and all subdirectories
tp search "database connection" --filter "src/**/*.{rs,py,js}"

# Handler files in nested API directories
tp search "request handler" --filter "**/api/**/handlers/*.rs"

Advanced Pattern Examples

# Test files with specific naming patterns
tp search "integration test" --filter "tests/**/*_{test,spec}.{js,ts}"

# Source files excluding certain directories
tp search "function definition" --filter "src/**/*.rs" --filter "!**/target/**"

# Files in multiple specific directories
tp search "configuration" --filter "{src,config,scripts}/**/*.{json,yaml}"

# Files with numeric suffixes
tp search "version" --filter "**/*[0-9].{js,py,rs}"

Pattern Behavior

Path Matching: Patterns match against the entire file path, not just the filename:

  • *.rs matches main.rs, src/main.rs, and lib/nested/file.rs
  • src/*.rs matches src/main.rs but not src/nested/file.rs
  • src/**/*.rs matches both src/main.rs and src/nested/file.rs

Case Sensitivity: Patterns are case-sensitive by default:

  • *.RS matches FILE.RS but not file.rs
  • *.rs matches file.rs but not FILE.RS

Path Separators: Always use forward slashes (/) in patterns:

  • src/api/*.js (correct)
  • src\\api\\*.js (incorrect)

Combining with File Type Filter: You can use both --filter and --filetype together:

# Search for Rust files in src directory only
tp search "async" --filetype .rs --filter "src/**/*"

Performance Tips

  • Simple patterns are faster: *.rs is faster than **/*.rs
  • Be specific when possible: src/*.js is faster than **/*.js if you know files are in src/
  • Avoid excessive wildcards: Patterns with many ** can be slower on large codebases
  • Use file type filter for extensions: --filetype .rs is optimized compared to --filter "*.rs"

Troubleshooting Glob Patterns

Pattern doesn't match expected files:

  • Check case sensitivity: *.RS vs *.rs
  • Verify path structure: src/*.js only matches direct children of src/
  • Use ** for recursive matching: src/**/*.js matches nested files

Pattern matching too many files:

  • Be more specific: use src/*.js instead of *.js
  • Add more path components: src/components/*.jsx
  • Use character classes: test_[0-9]*.rs instead of test_*.rs

Complex patterns not working:

  • Test simpler patterns first: start with *.ext then add complexity
  • Check for typos in braces: {js,ts} not {js, ts} (no spaces)
  • Validate bracket expressions: [a-z] not [a-Z]

For more pattern examples and troubleshooting, see the TROUBLESHOOTING.md file.

Configuration

TurboProp supports optional configuration via a .turboprop.yml file in your repository root:

# .turboprop.yml
max_filesize: "2mb"
model: "sentence-transformers/all-MiniLM-L6-v2"
cache_dir: "~/.turboprop-cache"
worker_threads: 4
batch_size: 32
default_output: "json"
similarity_threshold: 0.3

Output Formats

JSON Output (Default)

{
  "file": "src/auth.rs",
  "score": 0.8234,
  "content": "fn authenticate_user(token: &str) -> Result<User, AuthError> { ... }"
}

Text Output

Score: 0.82 | src/auth.rs
fn authenticate_user(token: &str) -> Result<User, AuthError> {
    // JWT token validation logic
    ...
}

Performance Characteristics

  • Indexing Speed: ~100-500 files/second (depending on file size and hardware)
  • Search Speed: ~10-50ms per query (after initial model loading)
  • Memory Usage: ~50-200MB (varies with model and index size)
  • Storage: Index size is typically 10-30% of source code size
  • File Count: Up to 10,000 files (tested)
  • File Size: Up to 2MB per file (configurable)
  • Total Codebase: Up to 500MB of source code

Supported File Types

TurboProp works with any text-based file but is optimized for common programming languages:

  • Web: .js, .ts, .jsx, .tsx, .html, .css, .scss, .vue
  • Backend: .py, .rs, .go, .java, .kt, .scala, .rb, .php
  • Systems: .c, .cpp, .h, .hpp, .cs, .swift
  • Data: .sql, .json, .yaml, .yml, .xml, .toml
  • Docs: .md, .txt, .rst
  • Config: .env, .ini, .conf, .cfg

Integration Examples

With Git Hooks

Add to .git/hooks/post-commit:

#!/bin/bash
tp index --repo . --max-filesize 2mb

With IDEs

Many IDEs can be configured to run external tools. Add TurboProp as a custom search tool.

With CI/CD

# In your CI script
tp index --repo . --max-filesize 2mb
tp search "security vulnerability" --repo . --output json > security-search-results.json

Troubleshooting

Common Issues

Index not found

Error: No index found in repository

Solution: Run tp index --repo . first to create an index.

Model download fails

Error: Failed to download model

Solution: Check internet connection or specify a local cache directory with --cache-dir.

Large files skipped

Warning: Skipping large file (>2MB)

Solution: Increase limit with --max-filesize 5mb or exclude large files.

Out of memory

Error: Out of memory during indexing

Solution: Reduce --batch-size or --worker-threads, or exclude large files.

Getting Help

tp --help              # General help
tp index --help        # Index command help
tp search --help       # Search command help

Development

Building from Source

git clone https://github.com/glamp/turboprop-rust
cd turboprop-rust
cargo build --release

Running Tests

cargo test                    # Run all tests
cargo test --test integration # Run integration tests only
cargo bench                   # Run benchmarks

Dependencies

  • clap: CLI parsing and help generation
  • tokio: Async runtime for I/O operations
  • serde: JSON serialization
  • fastembed: Machine learning embeddings
  • git2: Git repository integration
  • notify: File system watching
  • walkdir: Directory traversal

See Also

For more detailed information:

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for your changes
  4. Ensure all tests pass: cargo test
  5. Submit a pull request

License

Licensed under either of:

at your option.

Dependencies

~78MB
~1.5M SLoC