#hash-table #dht #kademlia #distributed #table #hash

kad

A generic / futures based implementation of the Kademlia Distributed Hash Table (DHT)

9 releases (5 breaking)

0.6.1 Aug 6, 2021
0.6.0 Jan 11, 2021
0.5.0 May 2, 2020
0.4.0 Apr 15, 2019
0.1.0 Dec 11, 2018

#1812 in Asynchronous


Used in dsf-daemon

MIT/Apache

105KB
2.5K SLoC

rust-kad

A generic / futures based implementation of the Kademlia DHT, heavily inspired by dtantsur/rust-dht with the goal of providing a simple to use, well-tested, low-dependency, futures-based API for using DHTs with arbitrary communication mechanisms and encodings

Usage

Configuration

  • k - system wide replication parameter, defines bucket sizes and search breadth
  • concurrency - system wide concurrency parameter, number of parallel operations to run at once

Status

GitHub tag Build Status Crates.io Docs.rs

Open Issues

Work In Progress

Components

  • Receive message

    • Update appropriate k-bucket
      • Add node if bucket not full
      • Store pending if bucket full and ping oldest (if > seen time)
    • Respond to Ping with NoResult
    • Respond to FindNodes with NodesFound
    • Respond to FindValues with NodesFound or ValuesFound
    • For new node, Send STORE RPC if own ID is closer to key than known nearby nodes
  • Search - common to most operations

    • Select alpha closest nodes
    • Send RPCs to selected subset of nodes
    • If no suitable responses, expand to k closest nodes
    • Recurse until responses received from k closest nodes
  • Find Node

    • Search using FIND_NODE RPC
    • Return node
  • Find Value

    • Search using FIND_VALUE RPC
    • Collect values
    • Once values returned, store (k, v) pair at the closest node observed that did not return the value
  • Store Value

    • Search using FIND_NODE RPC
    • Send STORE RPC to k closest nodes
  • Connect

    • Insert known node into appropriate k-bucket
    • Perform Find Node lookup on own ID
    • Refresh all k-buckets further than the closest neighbor
  • Maintanence

    • Remove non-responsive / old contacts
    • Expire values
      • Basic expiry after defined time
      • Cache expiry exponentially inversely proportional to number of nodes between current node and closest to key ID node
    • Refresh k-buckets
      • FIND_NODES to random ID in any bucket not queried in a configurable period
    • Re-publish values
      • STORE RPC to K nodes at defined period (hourly)
      • Unless a STORE RPC has been received in the same period
  • Buckets

    • Implement bucket splitting (if it can be done more efficiently than existing?)
      • Useful for maintenance / don't need to message unused buckets
    • Reverse / generate random IDs in bucket for maintenance purposes

Questions

  • How is FindValues usually handled where there can be more than one value per ID?
  • Is there a case when STORE is valid when the origin ID is closer to the requester ID than the local ID?
    • This seems like it could be ignored

Dependencies

~6.5MB
~118K SLoC