3 releases
0.1.2 | Jun 2, 2024 |
---|---|
0.1.1 | Jun 2, 2024 |
0.1.0 | May 27, 2024 |
#292 in Images
250KB
288 lines
tiny-data
A rust-based cli tool for building computer vision datasets built with reqwest and tokio.
You can get a list of the available options by running the command below:
>> tiny-data -h
Usage: tiny-data [OPTIONS]
Options:
-t, --topics <TOPICS>... Space-delimited list of image classes
-n, --nsamples <NSAMPLES> number of images to download per-class [default: 20]
-d, --dir <DIR> name of directory to save to [default: images]
-h, --help Print help
Example:
>> tiny-data --topics bats wombats -n 10 --dir images
>> tree images
images
├── bats
│ ├── 0.jpeg
│ ├── 1.jpeg
│ ├── 2.jpeg
│ ├── 3.jpeg
│ ├── 4.jpeg
│ ├── 5.jpeg
│ ├── 6.jpeg
│ ├── 7.jpeg
│ ├── 8.jpeg
│ └── 9.jpeg
└── wombats
├── 0.jpeg
├── 1.jpeg
├── 2.jpeg
├── 3.jpeg
├── 4.jpeg
├── 5.jpeg
├── 6.jpeg
├── 7.jpeg
├── 8.jpeg
└── 9.jpeg
Installation
To get started with tiny-data
you need to enable the Custom Search API from Google and export the variables SEARCH_ENGINE_ID
and CUSTOM_SEARCH_API_KEY
to your environment.
Note: google limits the number of requests to 100/day which inherently puts a cap on the number of images you can download.
The package itself can be downloaded from crates.io by running:
cargo install tiny-data
The python bindings for the package can be downloaded from pypi with additional features for post-download filtering using CLIP by running:
pip install tinydata[ml]
Dependencies
~12–24MB
~353K SLoC