#tree-sitter #chunks #splitter #contextual #class #break #metadata

devgen-splitter

Devgen Splitter is a Rust library that breaks down source code into contextual chunks

9 releases

0.4.7 Nov 18, 2024
0.4.6 Nov 17, 2024
0.4.5 Oct 26, 2024
0.3.2 Oct 15, 2024
0.2.1 Oct 14, 2024

#909 in Rust patterns

MIT license

125KB
1.5K SLoC

Rust 1K SLoC // 0.0% comments Scheme 212 SLoC // 0.0% comments

Devgen Splitter is a Rust library that breaks down source code into contextual chunks. It utilizes tree-sitter to identify code entities (such as classes, functions, and methods) and generate chunks with contextual metadata.

Crates.io Version codecov GitHub Actions Workflow Status

splitter

Features

  • Language-aware code splitting
  • Generate chunks with contextual metadata
  • Support for multiple programming languages

why devgen-splitter?

If you are building a code search agent, you may want to the LLM to generate the link for related class, struct, enum, etc. Devgen Splitter can help you generate the chunks with contextual metadata.

Usage

Add devgen-splitter to your project:

cargo add devgen-splitter

Basic usage example:

use devgen_splitter::{SplitOptions, split};
let code = "fn main() { println!(\"Hello, world!\"); }";
let options = SplitOptions { chunk_line_limit: 10};
let chunks = split("example.rs", code, &options).unwrap();
for chunk in chunks {
    println!("Chunk: {:?}", chunk);
}

For more examples, go to examples

Supported Languages

Language Query Rules Splitter Test
Rust
TypeScript
Java
Python
Solidity
Go 🚧 🚧 🚧
C++ 🚧 🚧 🚧
C 🚧 🚧 🚧

More languages coming soon!

Language Mapping

The following table shows how different code structures are represented across various programming languages and their corresponding tree-sitter query rule names:

Type Tree-sitter Query Rust Java TypeScript Python Go C++
Function function.definition function N/A function/array function function function function
Method method.definition method method method method method method
Struct struct.declaration struct class interface class struct struct
Class class.declaration impl class class class N/A class
Interface interface.declaration trait interface N/A N/A N/A N/A
Enum enum.declaration enum enum enum N/A N/A enum

Development Status

Devgen Splitter is in active development. We welcome community contributions and feedback.

Dependencies

~192MB
~5.5M SLoC