7 releases
new 0.3.3 | Jun 16, 2024 |
---|---|
0.3.2 | Jun 16, 2024 |
0.2.1 | Jun 13, 2024 |
0.1.0 | Jun 13, 2024 |
#86 in Asynchronous
582 downloads per month
98KB
1.5K
SLoC
Swiftide
Blazing fast asynchronous, parallel file ingestion and processing for RAG.
Explore the docs »
View Demo
·
Report Bug
·
Request Feature
About The Project
Swiftide is a straightforward, easy-to-use, easy-to-extend asynchronous file ingestion and processing system. It is designed to be used in a RAG (Retrieval Augmented Generation) system. It is built to be fast and efficient, with a focus on parallel processing and asynchronous operations.
While working with other Python-based tooling, frustrations arose around performance, stability, and ease of use. Thus, Swiftide was born. Ingestion performance went from multiple tens of minutes to a few seconds.
Part of the bosun.ai project. An upcoming platform for autonomous code improvement.
We <3 feedback: project ideas, suggestions, and complaints are very welcome. Feel free to open an issue.
Example
IngestionPipeline::from_loader(FileLoader::new(".").with_extensions(&["rs"]))
.filter_cached(RedisNodeCache::try_from_url(
redis_url,
"swiftide-examples",
)?)
.then(MetadataQACode::new(openai_client.clone()))
.then_chunk(ChunkCode::try_for_language_and_chunk_size(
"rust",
10..2048,
)?)
.then_in_batch(10, OpenAIEmbed::new(openai_client.clone()))
.then_store_with(
Qdrant::try_from_url(qdrant_url)?
.batch_size(50)
.vector_size(1536)
.collection_name("swiftide-examples".to_string())
.build()?,
)
.run()
.await?;
Features
- Extremely fast streaming pipeline with parallel processing
- Integrations with OpenAI, Redis, Qdrant and Treesitter
- Bring your own transformers by extending straightforward traits.
- Store into multiple backends
tracing
supported
Vision
Our goal is to create an extremely fast, extendable platform for ingestion and querying to further the development of automated LLM applications, with an easy-to-use and easy-to-extend api.
Getting Started
Prerequisites
Make sure you have the rust toolchain installed. rustup Is the recommended approach.
To use OpenAI, an API key is required. Note that by default async_openai
uses the OPENAI_API_KEY
environment variables.
Other integrations will need to be installed accordingly.
Installation
- Set up a new Rust project
- Add swiftide
cargo add swiftide
- Write a pipeline (see our examples and documentation)
Usage and concepts
Before building your stream, you need to configure any integrations required. See /examples for a full example.
A stream starts with a Loader that emits IngestionNodes. For instance, with the Fileloader each file is a Node.
You can then slice and dice, augment, and filter nodes. Each different kind of step in the pipeline requires different traits. This enables extension.
IngestionNodes have a path, chunk and metadata. Currently metadata is copied over when chunking and always embedded when using the OpenAIEmbed transformer.
- from_loader
(impl Loader)
starting point of the stream, creates and emits IngestionNodes - filter_cached
(impl NodeCache)
filters cached nodes - then
(impl Transformer)
transforms the node and puts it on the stream - then_in_batch
(impl BatchTransformer)
transforms multiple nodes and puts them on the stream - then_chunk
(impl ChunkerTransformer)
transforms a single node and emits multiple nodes - then_store_with
(impl Storage)
stores the nodes in a storage backend, this can be chained
Additionally, several generic transformers are implemented. They take implementers of SimplePrompt
and Embed
to do their things.
All integrations are enabled by default but can be disabled with feature flags.
note: Due to the performance, chunking before adding metadata gives rate limit errors on OpenAI very fast, especially with faster models like 3.5-turbo. Be aware.
For more examples, please refer to /examples and the Documentation
Roadmap
- Python / Nodejs bindings
- Multiple storage and sparse vector support
- Query pipeline
See the open issues for a full list of proposed features (and known issues).
Contributing
Swiftide is in a very early stage and we are aware that we do lack features for the wider community. Contributions are very welcome. 🎉
If you have a great idea, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'feat: Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
License
Distributed under the MIT License. See LICENSE
for more information.
Dependencies
~14–43MB
~861K SLoC