34 releases
0.1.7 | Nov 25, 2024 |
---|---|
0.1.5 | Sep 5, 2024 |
0.1.4 | Jul 21, 2024 |
0.0.2 | Mar 4, 2024 |
#1095 in Web programming
294 downloads per month
Used in 3 crates
56KB
1K
SLoC
fav_core is the core library of fav_cli (A cli tool to download remote resources and keep a local state in protobuf). In simple words, fav_core
is a helper to build a stateful crawler.
Usage
fav_utils provides the utils for fav_cli, which now only support BiliBili(Like Chinese YouTube). You can see it as an example for using this crate.
To save status, instead of using json, this crate uses protobuf
since it is faster. You need to define data structures with protobuf like this example (To derive trait for code generated by protobuf, see example).
Sets
contains Set
s, Set
contains Res
s(resource). The workflow is:
- fetch
Sets
to refreshSet
s - fetch
Set
to refreshRes
s - fetch and pull
Res
to download
To implement this workflow and maintain a local state, fav_core
has many useful traits:
- network helper
Api
: help defining the APIsApiProvider
: make app able to provide API based onApiKind
enumNet
: make app able to use the Internet
- Config
-
Config: HttpConfig + ProtoLocal
mark the app able to be config and persisted -
HttpConfig
: define the default headers, cookies
- Status and attributes
Sets
: iterate over and get subset of setsSet
: iterate over and get subset of resourcesRes: Meta
Meta
: the metadata of resource,Meta: Attr + Status
Attr
: provide resource's id and titleStatus
: the status of resource, like saved, fetched, tracked and expired
- Operations
Ops
:Ops: AuthOps + SetsOps + SetOps + ResOps
, means the app can perform all needed operationsAuthOps
: used to login and logoutSetsOps
: used tofetch_sets
info, for example, addEnglish
Chinese
Japanese
as new movie collections toSets
defined in protobuf.SetOps
: used tofetch_set
info, for example, add 《Oliver Twist》《Roman Holiday》《Twelve Angry Men》toEnglish
collection.ResOps
: used tofetch
andpull
, for example,fetch
id of 《Oliver Twist》 in target website,pull
the resources to local disk based on the fetched id.
- Persistence
PathInfo
: defined where to store status and configProtoLocal
:ProtoLocal: PathInfo + MessageFull
used to read and write status and configSaveLocal
: make app able to downloadRes
, and modify local status.
- visualize (optional): show status as table
- Ext methods:
SetOpsExt: SetOps
batch fetch set in setsResOpsExt: ResOps
batch fetch resources in setXXStatusExt
: batch modify children's StatusFlags
To draw a conclusion, this crate contains all traits you need to build a stateful crawler. You can define data structures with protobuf
for fast read and write. Make them stateful, configurable, and able to be persisted. Many network helper is provided, you can request_json
and resquest_protobuf
directly. And Ext
traits are provided so that you can batch fetch and pull data or modify the resources' StatusFlags.
An example can be found in fav repo.
CHANGELOG
- 0.1.1 -> 0.1.2:
XXOpsExt
needsbatch_size
passed so that users can define the number of jobs concurrently. - 0.0.X -> 0.1.X:
Ops
related traits' methods needFut: Future<...>
, if Future is ready, one can cleanup, shutdown gracefully and returnFavCoreError::Cancel
. AndOpsExt
methods handle SIGINT based on this, keeps things reliable.
Dependencies
~9–22MB
~308K SLoC