4 releases
0.1.3 | Mar 22, 2023 |
---|---|
0.1.2 | Mar 22, 2023 |
0.1.1 | Mar 22, 2023 |
0.1.0 | Mar 22, 2023 |
#7 in #tokenizers
43 downloads per month
15KB
88 lines
token_trekker_rs
token_trekker_rs
is a command-line tool for counting the total number of tokens in all files within a directory or matching a glob pattern, using various tokenizers.
Features
- Supports multiple tokenizer options
- Parallel processing for faster token counting
- Outputs results in a colorized table
Installation
To install token_trekker_rs
from crates.io, run:
cargo install token_trekker_rs
Building from Source
To build token_trekker_rs from the source code, first clone the repository:
git clone https://github.com/1rgs/token_trekker_rs.git
cd token_trekker_rs
Then build the project using cargo:
cargo build --release
The compiled binary will be available at ./target/release/token-trekker.
Usage
To count tokens in a directory or for files matching a glob pattern, run the following command:
token-trekker --path <path_or_glob_pattern> <tokenizer>
Replace <path_or_glob_pattern> with the path to the directory or the glob pattern of the files to process, and with one of the available tokenizer options:
- p50k-base
- p50k-edit
- r50k-base
- cl100k-base
- gpt2
For example:
token_trekker_rs --path "path/to/files/*.txt" p50k-base
Dependencies
~26–43MB
~480K SLoC