7 unstable releases (3 breaking)
0.3.1 | Feb 5, 2021 |
---|---|
0.3.0 | Feb 5, 2021 |
0.2.0 | Feb 4, 2021 |
0.1.0 | Feb 3, 2021 |
0.0.2 | Jan 6, 2021 |
#46 in #web-crawler
36KB
660 lines
od-get
A Rust tool for recursively crawling & downloading data from open directories
- Filtering (regex) support
- Exclude file patterns
- Include file patterns
- Exclude folder patterns
- Include folder patterns
- Customizable output
- Target directory
- Verbosity
- Metadata-JSON file generation
- Log file/dynamic terminal output
- Customizable limits
- recursion depth limit
- file count limit
- file count offset (skip
n
files)
- Multi threaded (using
rayon
) - Resume on error (avoid re-downloading files)
- Multi-level recursion
- Disable download (only crawl to JSON)
(work in progress, one layer of recursion works)
Licence & Copyright
Copyright (c) 2021 Bernd-L. All rights reserved.
od-get is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
od-get is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with od-get. If not, see https://www.gnu.org/licenses/.
This project (including its source code and its documentation) is released under the terms of the GNU Affero General Public License.
Dependencies
~12–26MB
~379K SLoC