4 releases

new 0.1.3 Dec 19, 2024
0.1.2 Dec 19, 2024
0.1.1 Dec 19, 2024
0.1.0 Dec 19, 2024

#502 in Algorithms

Download history

63 downloads per month

MIT license

9KB
93 lines

faro_sharding

Faro Sharding is a technique for sharding keys such that adding new destinations does not move data between existing destinations. Only between existing and the new destination.

For example, if we have data sharded among 10 servers, when we add an 11th, we don't want to move data from 3 to 5. Faro Sharding guarantees that won't happen.

Example

use faro_sharding::shard_for;
for locations in 4..50 {
  assert_eq!(shard_for("foo", locations), 2);
}
assert_eq!(shard_for("foo", 50), 49);

Distribution

Faro Sharding shows high quality even distribution among locations. With 1,000,000 keys and 100 locations, the largest has 10229 keys (1.02%) and the smallest has 9761 keys (0.98%).

If you want to evaluate the distribution with your own hashing function, you can modify examples/distribution.rs to use your hasher.

Algorithm

Inspired by JumpHash, Faro Sharding sequentially hashes the initial key and then the results of those hashes. Different from JumpHash, Faro Sharding only changes the shard for a key when hash % i == 0.

Faro Sharding can only move a key to a shard on the step corresponding to that shard number. That means that the only movement happens to shard N on step N. Additionally, approximately 1/N of keys are moved on step N, resulting in approximately even distribution.

License: MIT

Dependencies

~44KB