34 releases

0.1.7 Nov 25, 2024
0.1.5 Sep 5, 2024
0.1.4 Jul 21, 2024
0.0.2 Mar 4, 2024

#1095 in Web programming

Download history 5/week @ 2024-08-25 164/week @ 2024-09-01 16/week @ 2024-09-08 32/week @ 2024-09-15 108/week @ 2024-09-22 48/week @ 2024-09-29 6/week @ 2024-10-06 6/week @ 2024-10-13 1/week @ 2024-10-20 1/week @ 2024-10-27 10/week @ 2024-11-03 112/week @ 2024-11-10 10/week @ 2024-11-17 156/week @ 2024-11-24 16/week @ 2024-12-01

294 downloads per month
Used in 3 crates

MIT license

56KB
1K SLoC

fav_core is the core library of fav_cli (A cli tool to download remote resources and keep a local state in protobuf). In simple words, fav_core is a helper to build a stateful crawler.

Usage

fav_utils provides the utils for fav_cli, which now only support BiliBili(Like Chinese YouTube). You can see it as an example for using this crate.

To save status, instead of using json, this crate uses protobuf since it is faster. You need to define data structures with protobuf like this example (To derive trait for code generated by protobuf, see example).

Sets contains Sets, Set contains Ress(resource). The workflow is:

  1. fetch Sets to refresh Sets
  2. fetch Set to refresh Ress
  3. fetch and pull Res to download

To implement this workflow and maintain a local state, fav_core has many useful traits:

  1. network helper
  • Api: help defining the APIs
  • ApiProvider: make app able to provide API based on ApiKind enum
  • Net: make app able to use the Internet
  1. Config
  • Config: HttpConfig + ProtoLocal mark the app able to be config and persisted

  • HttpConfig: define the default headers, cookies

  1. Status and attributes
  • Sets: iterate over and get subset of sets
  • Set: iterate over and get subset of resources
  • Res: Meta
  • Meta: the metadata of resource, Meta: Attr + Status
  • Attr: provide resource's id and title
  • Status: the status of resource, like saved, fetched, tracked and expired
  1. Operations
  • Ops: Ops: AuthOps + SetsOps + SetOps + ResOps, means the app can perform all needed operations
  • AuthOps: used to login and logout
  • SetsOps: used to fetch_sets info, for example, add English Chinese Japanese as new movie collections to Sets defined in protobuf.
  • SetOps: used to fetch_set info, for example, add 《Oliver Twist》《Roman Holiday》《Twelve Angry Men》to English collection.
  • ResOps: used to fetch and pull , for example, fetch id of 《Oliver Twist》 in target website, pull the resources to local disk based on the fetched id.
  1. Persistence
  • PathInfo: defined where to store status and config
  • ProtoLocal: ProtoLocal: PathInfo + MessageFull used to read and write status and config
  • SaveLocal: make app able to download Res, and modify local status.
  1. visualize (optional): show status as table
  2. Ext methods:
  • SetOpsExt: SetOps batch fetch set in sets
  • ResOpsExt: ResOps batch fetch resources in set
  • XXStatusExt: batch modify children's StatusFlags

To draw a conclusion, this crate contains all traits you need to build a stateful crawler. You can define data structures with protobuf for fast read and write. Make them stateful, configurable, and able to be persisted. Many network helper is provided, you can request_json and resquest_protobuf directly. And Ext traits are provided so that you can batch fetch and pull data or modify the resources' StatusFlags.

An example can be found in fav repo.

CHANGELOG

  • 0.1.1 -> 0.1.2: XXOpsExt needs batch_size passed so that users can define the number of jobs concurrently.
  • 0.0.X -> 0.1.X: Ops related traits' methods need Fut: Future<...>, if Future is ready, one can cleanup, shutdown gracefully and return FavCoreError::Cancel. And OpsExt methods handle SIGINT based on this, keeps things reliable.

Dependencies

~9–22MB
~308K SLoC