#web-crawler #web-scraping #spider #command-line-interface #web-indexer

app spider-cloud-cli

The Spider Cloud CLI for web crawling and scraping

8 releases

0.1.23 Nov 7, 2024
0.1.22 Nov 4, 2024
0.1.5 Oct 11, 2024
0.1.3 Aug 27, 2024

#199 in Web programming

Download history 220/week @ 2024-08-05 7/week @ 2024-08-12 163/week @ 2024-08-19 167/week @ 2024-08-26 5/week @ 2024-09-16 7/week @ 2024-09-30 317/week @ 2024-10-07 31/week @ 2024-10-14 248/week @ 2024-11-04 8/week @ 2024-11-11

257 downloads per month

MIT license

19KB
334 lines

Spider Cloud CLI

Spider Cloud CLI is a command-line interface to interact with the Spider Cloud web crawler. It allows you to scrape, crawl, search, and perform various other web-related tasks through simple commands.

Installation

Install the CLI using homebrew or cargo from crates.io:

Homebrew

brew tap spider-rs/spider-cloud-cli
brew install spider-cloud-cli

Cargo

cargo install spider-cloud-cli

Usage

After installing, you can use the CLI by typing spider-cloud-cli followed by a command and its respective arguments.

Authentication

Before using most of the commands, you need to authenticate by providing an API key:

spider-cloud-cli auth --api_key YOUR_API_KEY

Commands

Scrape

Scrape data from a specified URL.

spider-cloud-cli scrape --url http://example.com

Crawl

Crawl a specified URL with an optional limit on the number of pages.

spider-cloud-cli crawl --url http://example.com --limit 10

Fetch links from a specified URL.

spider-cloud-cli links --url http://example.com

Screenshot

Take a screenshot of a specified URL.

spider-cloud-cli screenshot --url http://example.com

Search for a query.

spider-cloud-cli search --query "example query"

Transform

Transform specified data.

spider-cloud-cli transform --data "sample data"

Extract Contacts

Extract contact information from a specified URL.

spider-cloud-cli extract_contacts --url http://example.com

Label

Label data from a specified URL.

spider-cloud-cli label --url http://example.com

Get Crawl State

Get the crawl state of a specified URL.

spider-cloud-cli get_crawl_state --url http://example.com

Query

Query records of a specified domain.

spider-cloud-cli query --domain example.com

Get Credits

Fetch the account credits left.

spider-cloud-cli get_credits

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Issues and pull requests are welcome! Feel free to check the issues page if you have any questions or suggestions.

Acknowledgements

Special thanks to the developers and contributors of the libraries and tools used in this project.

Dependencies

~8–20MB
~272K SLoC