#job #pipeline #file #depend #back-end #job-scheduler #remote

captain-workflow-manager

Run and manage jobs that depend on each other on a variety of backends.s

1 unstable release

0.1.0 Dec 20, 2021

#25 in #depend

GPL-3.0-only

35KB
623 lines

This library helps you run pipelines of jobs that depend on each other. Its modularity allows to run jobs either locally, on a cluster or other remote computing resources.


lib.rs:

This library helps you run pipelines of jobs that depend on each other. Its modularity allows to run jobs either locally, on a cluster or other remote computing resources.

Consider a set of jobs as illustrated below, where two kind of source files are used to generate two kind of intermediate files which are in turn used to generate a final result.

                         ┌─────────────────┐
┌─────────────────┐      │  Source File B: │
│  Source File A: │      │    - Param1     │
│    - Param1     │      │    - Param2     │
└─────────┬───────┘      └───────┬─────────┘
          │                      │
          │                      │
          │           ┌──────────┴───────────────┐
          │           │                          │
          │           │                          │
          │           │                          │
          ▼           ▼                          ▼
      ┌──────────────────────┐       ┌──────────────────────┐
      │ Intermediate File A: │       │ Intermediate File B: │
      │   - Param1           │       │   - Param1           │
      │   - Param2           │       │   - Param2           │
      └────────────────┬─────┘       └──────┬───────────────┘
                       │                    │
                       └─────────┬──────────┘
                                 │
                                 │
                                 ▼
                         ┌───────────────┐
                         │ Final Result: │
                         │   - Param1    │
                         │   - Param2    │
                         └───────────────┘

Each kind of file has one or more "parameters".

This dependency could be represented by the following Job enum, assuming that Param1 is an integer and Param2 a string:

enum Job {
    SourceFileA {param1: u16},
    SourceFileB {param1: u16, param2: &'static str},
    IntermediateFileA {param1: u16, param2: &'static str},
    IntermediateFileB {param1: u16, param2: &'static str},
    FinalResult {param1: u16, param2: &'static str},
}

To manage this set of jobs using captain, one would first implement the JobUnit trait on it. And then run it using the job Scheduler. The back-end to run it on is chosen by selecting an ExecutorBuilder.

Dependencies

~4.5–6.5MB
~107K SLoC