#recursion #download #limit #web-crawler #directory #open #data

app od-get

A Rust tool for recursively crawling & downloading data from open directories

7 unstable releases (3 breaking)

0.3.1 Feb 5, 2021
0.3.0 Feb 5, 2021
0.2.0 Feb 4, 2021
0.1.0 Feb 3, 2021
0.0.2 Jan 6, 2021

#46 in #web-crawler

AGPL-3.0-or-later

36KB
660 lines

od-get

dependency status

A Rust tool for recursively crawling & downloading data from open directories

  • Filtering (regex) support
    • Exclude file patterns
    • Include file patterns
    • Exclude folder patterns
    • Include folder patterns
  • Customizable output
    • Target directory
    • Verbosity
    • Metadata-JSON file generation
    • Log file/dynamic terminal output
  • Customizable limits
    • recursion depth limit
    • file count limit
    • file count offset (skip n files)
  • Multi threaded (using rayon)
  • Resume on error (avoid re-downloading files)
  • Multi-level recursion
  • Disable download (only crawl to JSON)

(work in progress, one layer of recursion works)

Copyright (c) 2021 Bernd-L. All rights reserved.

AGPL v3: Free as in Freedom

od-get is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

od-get is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with od-get. If not, see https://www.gnu.org/licenses/.

This project (including its source code and its documentation) is released under the terms of the GNU Affero General Public License.

Dependencies

~12–26MB
~379K SLoC