34 major breaking releases

37.0.0 Sep 26, 2024
36.0.0 Jul 18, 2024
35.0.0 Jul 12, 2024
34.0.0 Jun 24, 2024
0.0.0 Nov 21, 2022

#5 in #phase

Download history 1632/week @ 2024-08-26 971/week @ 2024-09-02 1532/week @ 2024-09-09 1532/week @ 2024-09-16 2127/week @ 2024-09-23 2126/week @ 2024-09-30 1711/week @ 2024-10-07 2001/week @ 2024-10-14 2460/week @ 2024-10-21 2071/week @ 2024-10-28 2111/week @ 2024-11-04 21550/week @ 2024-11-11 26425/week @ 2024-11-18 29224/week @ 2024-11-25 23972/week @ 2024-12-02 29413/week @ 2024-12-09

110,337 downloads per month
Used in 75 crates (8 directly)

Apache-2.0

3MB
58K SLoC

Release

Polkadot SDK stable2409


lib.rs:

Multi phase, offchain election provider pallet.

Currently, this election-provider has two distinct phases (see Phase), signed and unsigned.

Phases

The timeline of pallet is as follows. At each block, frame_election_provider_support::ElectionDataProvider::next_election_prediction is used to estimate the time remaining to the next call to frame_election_provider_support::ElectionProvider::elect. Based on this, a phase is chosen. The timeline is as follows.

                                                                   elect()
                +   <--T::SignedPhase-->  +  <--T::UnsignedPhase-->   +
  +-------------------------------------------------------------------+
   Phase::Off   +       Phase::Signed     +      Phase::Unsigned      +

Note that the unsigned phase starts pallet::Config::UnsignedPhase blocks before the next_election_prediction, but only ends when a call to ElectionProvider::elect happens. If no elect happens, the signed phase is extended.

Given this, it is rather important for the user of this pallet to ensure it always terminates election via elect before requesting a new one.

Each of the phases can be disabled by essentially setting their length to zero. If both phases have length zero, then the pallet essentially runs only the fallback strategy, denoted by Config::Fallback.

Signed Phase

In the signed phase, solutions (of type RawSolution) are submitted and queued on chain. A deposit is reserved, based on the size of the solution, for the cost of keeping this solution on-chain for a number of blocks, and the potential weight of the solution upon being checked. A maximum of pallet::Config::SignedMaxSubmissions solutions are stored. The queue is always sorted based on score (worse to best).

Upon arrival of a new solution:

  1. If the queue is not full, it is stored in the appropriate sorted index.
  2. If the queue is full but the submitted solution is better than one of the queued ones, the worse solution is discarded, the bond of the outgoing solution is returned, and the new solution is stored in the correct index.
  3. If the queue is full and the solution is not an improvement compared to any of the queued ones, it is instantly rejected and no additional bond is reserved.

A signed solution cannot be reversed, taken back, updated, or retracted. In other words, the origin can not bail out in any way, if their solution is queued.

Upon the end of the signed phase, the solutions are examined from best to worse (i.e. pop()ed until drained). Each solution undergoes an expensive Pallet::feasibility_check, which ensures the score claimed by this score was correct, and it is valid based on the election data (i.e. votes and targets). At each step, if the current best solution passes the feasibility check, it is considered to be the best one. The sender of the origin is rewarded, and the rest of the queued solutions get their deposit back and are discarded, without being checked.

The following example covers all of the cases at the end of the signed phase:

Queue
+-------------------------------+
|Solution(score=20, valid=false)| +-->  Slashed
+-------------------------------+
|Solution(score=15, valid=true )| +-->  Rewarded, Saved
+-------------------------------+
|Solution(score=10, valid=true )| +-->  Discarded
+-------------------------------+
|Solution(score=05, valid=false)| +-->  Discarded
+-------------------------------+
|             None              |
+-------------------------------+

Note that both of the bottom solutions end up being discarded and get their deposit back, despite one of them being invalid.

Unsigned Phase

The unsigned phase will always follow the signed phase, with the specified duration. In this phase, only validator nodes can submit solutions. A validator node who has offchain workers enabled will start to mine a solution in this phase and submits it back to the chain as an unsigned transaction, thus the name unsigned phase. This unsigned transaction can never be valid if propagated, and it acts similar to an inherent.

Validators will only submit solutions if the one that they have computed is strictly better than the best queued one and will limit the weight of the solution to MinerConfig::MaxWeight.

The unsigned phase can be made passive depending on how the previous signed phase went, by setting the first inner value of Phase to false. For now, the signed phase is always active.

Fallback

If we reach the end of both phases (i.e. call to ElectionProvider::elect happens) and no good solution is queued, then the fallback strategy pallet::Config::Fallback is used to determine what needs to be done. The on-chain election is slow, and contains no balancing or reduction post-processing. If pallet::Config::Fallback fails, the next phase Phase::Emergency is enabled, which is a more fail-safe approach.

Emergency Phase

If, for any of the below reasons:

  1. No signed or unsigned solution submitted, and no successful Config::Fallback is provided
  2. Any other unforeseen internal error

A call to T::ElectionProvider::elect is made, and Ok(_) cannot be returned, then the pallet proceeds to the Phase::Emergency. During this phase, any solution can be submitted from Config::ForceOrigin, without any checking, via Pallet::set_emergency_election_result transaction. Hence, [Config::ForceOrigin] should only be set to a trusted origin, such as the council or root. Once submitted, the forced solution is kept in QueuedSolution until the next call to T::ElectionProvider::elect, where it is returned and Phase goes back to Off.

This implies that the user of this pallet (i.e. a staking pallet) should re-try calling T::ElectionProvider::elect in case of error, until OK(_) is returned.

To generate an emergency solution, one must only provide one argument: Supports. This is essentially a collection of elected winners for the election, and voters who support them. The supports can be generated by any means. In the simplest case, it could be manual. For example, in the case of massive network failure or misbehavior, Config::ForceOrigin might decide to select only a small number of emergency winners (which would greatly restrict the next validator set, if this pallet is used with pallet-staking). If the failure is for other technical reasons, then a simple and safe way to generate supports is using the staking-miner binary provided in the Polkadot repository. This binary has a subcommand named emergency-solution which is capable of connecting to a live network, and generating appropriate supports using a standard algorithm, and outputting the supports in hex format, ready for submission. Note that while this binary lives in the Polkadot repository, this particular subcommand of it can work against any substrate-based chain.

See the staking-miner docs for more information.

Feasible Solution (correct solution)

All submissions must undergo a feasibility check. Signed solutions are checked one by one at the end of the signed phase, and the unsigned solutions are checked on the spot. A feasible solution is as follows:

  1. all of the used indices must be correct.
  2. present exactly correct number of winners.
  3. any assignment is checked to match with RoundSnapshot::voters.
  4. the claimed score is valid, based on the fixed point arithmetic accuracy.

Accuracy

The accuracy of the election is configured via SolutionAccuracyOf which is the accuracy that the submitted solutions must adhere to.

Note that the accuracy is of great importance. The offchain solution should be as small as possible, reducing solutions size/weight.

Error types

This pallet provides a verbose error system to ease future debugging and debugging. The overall hierarchy of errors is as follows:

  1. pallet::Error: These are the errors that can be returned in the dispatchables of the pallet, either signed or unsigned. Since decomposition with nested enums is not possible here, they are prefixed with the logical sub-system to which they belong.
  2. ElectionError: These are the errors that can be generated while the pallet is doing something in automatic scenarios, such as offchain_worker or on_initialize. These errors are helpful for logging and are thus nested as:

Note that there could be an overlap between these sub-errors. For example, A SnapshotUnavailable can happen in both miner and feasibility check phase.

Future Plans

Emergency-phase recovery script: This script should be taken out of staking-miner in polkadot and ideally live in substrate/utils/frame/elections.

Challenge Phase. We plan on adding a third phase to the pallet, called the challenge phase. This is a phase in which no further solutions are processed, and the current best solution might be challenged by anyone (signed or unsigned). The main plan here is to enforce the solution to be PJR. Checking PJR on-chain is quite expensive, yet proving that a solution is not PJR is rather cheap. If a queued solution is successfully proven bad:

  1. We must surely slash whoever submitted that solution (might be a challenge for unsigned solutions).
  2. We will fallback to the emergency strategy (likely extending the current era).

Bailing out. The functionality of bailing out of a queued solution is nice. A miner can submit a solution as soon as they think it is high probability feasible, and do the checks afterwards, and remove their solution (for a small cost of probably just transaction fees, or a portion of the bond).

Conditionally open unsigned phase: Currently, the unsigned phase is always opened. This is useful because an honest validator will run substrate OCW code, which should be good enough to trump a mediocre or malicious signed submission (assuming in the absence of honest signed bots). If there are signed submissions, they can be checked against an absolute measure (e.g. PJR), then we can only open the unsigned phase in extreme conditions (i.e. "no good signed solution received") to spare some work for the active validators.

Allow smaller solutions and build up: For now we only allow solutions that are exactly DesiredTargets, no more, no less. Over time, we can change this to a [min, max] where any solution within this range is acceptable, where bigger solutions are prioritized.

Score based on (byte) size: We should always prioritize small solutions over bigger ones, if there is a tie. Even more harsh should be to enforce the bound of the reduce algorithm.

Take into account the encode/decode weight in benchmarks. Currently, we only take into account the weight of encode/decode in the submit_unsigned given its priority. Nonetheless, all operations on the solution and the snapshot are worthy of taking this into account. All of the tests here should be dedicated to only testing the feasibility check and nothing more. The best way to audit and review these tests is to try and come up with a solution that is invalid, but gets through the system as valid.

Dependencies

~17–31MB
~520K SLoC