3 releases

0.1.2 Jun 2, 2024
0.1.1 Jun 2, 2024
0.1.0 May 27, 2024

#292 in Images

MIT license

250KB
288 lines

tiny-data

A rust-based cli tool for building computer vision datasets built with reqwest and tokio.

alt text

You can get a list of the available options by running the command below:

>> tiny-data -h
Usage: tiny-data [OPTIONS]

Options:
  -t, --topics <TOPICS>...   Space-delimited list of image classes
  -n, --nsamples <NSAMPLES>  number of images to download per-class [default: 20]
  -d, --dir <DIR>            name of directory to save to [default: images]
  -h, --help                 Print help

Example:

>> tiny-data --topics bats wombats -n 10 --dir images
>> tree images
images
├── bats
   ├── 0.jpeg
   ├── 1.jpeg
   ├── 2.jpeg
   ├── 3.jpeg
   ├── 4.jpeg
   ├── 5.jpeg
   ├── 6.jpeg
   ├── 7.jpeg
   ├── 8.jpeg
   └── 9.jpeg
└── wombats
    ├── 0.jpeg
    ├── 1.jpeg
    ├── 2.jpeg
    ├── 3.jpeg
    ├── 4.jpeg
    ├── 5.jpeg
    ├── 6.jpeg
    ├── 7.jpeg
    ├── 8.jpeg
    └── 9.jpeg

Installation

To get started with tiny-data you need to enable the Custom Search API from Google and export the variables SEARCH_ENGINE_ID and CUSTOM_SEARCH_API_KEY to your environment.

Note: google limits the number of requests to 100/day which inherently puts a cap on the number of images you can download.

The package itself can be downloaded from crates.io by running:

cargo install tiny-data

The python bindings for the package can be downloaded from pypi with additional features for post-download filtering using CLIP by running:

pip install tinydata[ml]

Dependencies

~12–24MB
~353K SLoC