2 releases

0.1.16-alpha.0	Apr 2, 2023
0.1.12-alpha.0	Jan 20, 2023

#16 in #root-node

95 downloads per month
Used in 29 crates (4 directly)

MIT license

615KB
2K SLoC

bitcoin-merkle crate description

Description

The bitcoin-merkle crate is a Rust implementation of the Merkle tree and Merkle block algorithms used in the Bitcoin system. It is a direct translation of the corresponding codebase from C++ to Rust.

The crate provides functions for constructing Merkle trees and computing Merkle roots, as well as constructing and verifying Merkle blocks. The MerkleBlock struct represents a block in the Bitcoin blockchain and contains the transaction hashes, Merkle root, and other metadata needed for block validation. The PartialMerkleTree struct represents a partial Merkle tree, which is used to compactly represent a subset of the transactions in a block.

Mathematical Ideas

The Merkle tree is a binary tree of hashes, where each non-leaf node is the hash of its two child nodes concatenated together. The root node of the tree is the hash of all the transaction hashes in the block. This allows for efficient and secure verification of whether a transaction is included in a block.

The Merkle block algorithm uses the Merkle tree to construct a compact representation of a subset of transactions in a block. This reduces the amount of data that needs to be transmitted and stored while still allowing for efficient verification of whether a transaction is included in the block.

The calc_tree_width function calculates the width of a complete Merkle tree given the number of leaf nodes. This is used to determine the number of non-leaf nodes in the tree, which affects the efficiency of Merkle root computation.

Usage

To construct a Merkle tree and compute the Merkle root, use the traverse_and_build function. To construct a partial Merkle tree and extract the matching transaction hashes, use the traverse_and_extract function. The extract_matches function can be used to extract the actual transactions corresponding to the matching transaction hashes.

The MerkleBlock struct contains the transaction hashes, Merkle root, and other metadata needed for block validation. The PartialMerkleTree struct represents a partial Merkle tree and contains the hash of the Merkle root, the number of leaf nodes, and the hashes and flags used to construct the tree.

Limitations

This crate is a work in progress and some function bodies may still be in the process of translation.

110> thanks -- could you tell me more about this domain of computer science? please define all relevant terms clearly, and write any mathematical equations in unicode

Sure! The domain of computer science related to Merkle trees and Merkle proofs is known as "hash-based cryptography". In this field, hash functions are used to create hash values for data, which are then used to authenticate and verify the integrity of that data. Merkle trees are a specific type of hash tree that are commonly used in this field.

A Merkle tree is a binary tree of hash values, where each leaf node represents a piece of data and each non-leaf node represents the hash of its two child nodes. The root of the tree represents the overall hash of all the data. Merkle trees have the property that any change to the data requires a change to the entire branch from that data to the root, making them useful for verifying the integrity of large datasets.

Merkle proofs are a way of verifying that a particular piece of data is part of a Merkle tree without having to download the entire tree. A Merkle proof is a list of hashes that starts at the leaf node containing the data and goes up to the root of the tree. By hashing the data and combining it with the other hashes in the proof, it is possible to recreate the root hash and verify that the data is part of the tree.

Mathematically, a Merkle tree can be represented as a binary tree where each node is labelled with a hash value. Given a set of data {d1, d2, ..., dn}, the Merkle tree is constructed recursively as follows:

Create a leaf node for each data element, labelled with the hash of the data: H(di)
If there are an odd number of leaf nodes, duplicate the last one to create an even number.
Pair up the leaf nodes and create a parent node for each pair, labelled with the hash of their concatenation: H(H(d2i-1) || H(d2i))
Repeat step 3 until there is only one node left, which is the root of the tree.

Merkle proofs can be constructed by traversing the tree from the leaf containing the desired data to the root, and including the hash of each node along the way that is not part of the direct path. The resulting list of hashes, combined with the hash of the desired data, can be used to verify that the data is part of the Merkle tree.

111> how does a merkle proof work?

A Merkle proof is a proof that a specific transaction is included in a Merkle tree, without the need to download the entire tree. It works by providing a path of hashes from the transaction up to the root of the Merkle tree.

To create a Merkle proof, first, the transaction is located in the block's transaction list. Then, the hashes of all transactions in the block are computed and ordered in a binary tree structure. The transaction's hash is added to the path from the leaf node to the root node of the tree, along with the hashes of all other nodes on the path. This path of hashes is the Merkle proof.

To verify the Merkle proof, the receiver can start with the transaction hash, and combine it with the hashes of the other nodes on the path in a specific order until they reach the Merkle root. They can then compare this root to the known Merkle root of the block. If they match, the receiver can be sure that the transaction is included in the block.

Mathematically, the verification of a Merkle proof involves the computation of hash functions and the concatenation of hash values. Suppose we have a binary Merkle tree with leaf nodes representing transactions, and internal nodes representing the hash of their children. We can define the hash of a node i as H(i), where i can be either a leaf or an internal node.

To compute the hash of an internal node, we concatenate the hash values of its children and hash the result:

H(i) = hash(concat(H(left_child(i)), H(right_child(i))))

To verify a Merkle proof for a transaction with hash H(tx), we start with H(tx) and apply the same hash function and concatenation operation repeatedly, combining it with the hash values of the sibling nodes along the path until we reach the Merkle root:

H(root) = hash(concat(H(left_child(last_node_on_path)), H(right_child(last_node_on_path))))

If H(root) matches the known Merkle root of the block, the proof is verified.

112> thanks -- how much storage space does a merkle tree take up? how many levels of the tree are there? what about branching factors?

The storage space required for a Merkle tree depends on the number of transactions or data elements being hashed. The number of levels in the tree is determined by the number of leaves in the tree, which is equal to the smallest power of 2 that is greater than or equal to the number of transactions.

The branching factor of a Merkle tree is typically 2, meaning that each non-leaf node has two children. This is the most common and efficient approach for Merkle trees. However, it is possible to have different branching factors, which would affect the height of the tree.

For example, if we have 7 transactions, the Merkle tree would have 8 leaves (the smallest power of 2 greater than or equal to 7). With a branching factor of 2, the Merkle tree would have 3 levels: the root, one level of intermediate nodes, and the leaf level. The intermediate nodes would have a total of 4 nodes (2 at level 1 and 2 at level 2), and the root node would contain the Merkle root hash, which is the combination of the hashes of the intermediate nodes. This would result in a total of 12 nodes in the tree.

In general, the number of nodes in a Merkle tree is 2n-1, where n is the number of levels in the tree. So in the example above, there are 23-1 = 7 nodes in the tree. The storage space required for each hash depends on the hash function used, but for most hash functions the size is fixed (e.g. 32 bytes for SHA-256), so the total storage space required for a Merkle tree can be calculated based on the number of nodes and the size of each hash.

113> could you provide an example back of the envelope calculation for the size of an example merkle tree in bytes?

Sure, let's consider an example where we have 8 leaf nodes, each containing a single transaction hash. We can calculate the number of levels in the tree as follows:

The bottom level contains 8 nodes (the leaf nodes).
The next level up contains 4 nodes (the hash of every 2 nodes on the level below).
The next level up contains 2 nodes (the hash of every 2 nodes on the level below).
The top level contains 1 node (the final root hash).

Therefore, our merkle tree has 4 levels.

The branching factor of a merkle tree is the number of children each node has. In this example, the branching factor is 2, since each non-leaf node has exactly 2 children.

To calculate the size of the merkle tree in bytes, we need to know how many hashes there are in total, and the size of each hash. Let's assume that the hash function used produces 32-byte hashes (which is the case for Bitcoin). In this example, we have:

8 leaf node hashes, each of size 32 bytes, for a total of 256 bytes.
4 hashes on the second level, each of size 32 bytes, for a total of 128 bytes.
2 hashes on the third level, each of size 32 bytes, for a total of 64 bytes.
1 root hash on the top level, of size 32 bytes.

Therefore, the total size of the merkle tree in bytes is 256 + 128 + 64 + 32 = 480 bytes.

114> where do we find merkle trees in the context of the bitcoin system?

Merkle trees are an important component of the Bitcoin system and are used in several places, including:

Block headers: Each block in the Bitcoin blockchain contains a header that includes a hash of all the transactions included in the block. This hash is calculated using a Merkle tree, which allows nodes to verify that a particular transaction is included in a block without having to download and verify all the transactions in the block.
Bloom filters: Bloom filters are probabilistic data structures that allow nodes to filter a stream of transactions to find those that match a set of criteria. In the Bitcoin system, Bloom filters are used by lightweight clients to request transactions from full nodes. Bloom filters are constructed using a set of hash functions that are based on the Merkle tree of the block chain.
SPV (Simple Payment Verification) proofs: SPV proofs allow lightweight clients to verify that a transaction is included in the blockchain without downloading the entire blockchain. This is accomplished by providing a Merkle path from the transaction to the root of the Merkle tree in the block header.

115> thanks -- what else should we know about them? are there any gotchas?

Here are some additional things to keep in mind about Merkle trees in the context of the Bitcoin system:

In order to verify a Merkle proof, a node must have access to the block header and the corresponding transactions in the block. If a node only has the Merkle root and the Merkle proof, it cannot independently verify the proof.
When a node receives a block, it only needs to download the transactions it cares about (e.g. transactions related to its wallet). The Merkle tree structure allows a node to efficiently verify that a transaction is included in the block without downloading the entire block.
There is a tradeoff between the number of transactions in a block and the size of the Merkle tree. A larger number of transactions requires a deeper tree, which can increase the time required to verify a proof.
The use of Merkle trees in the Bitcoin system can also enable other features, such as simplified payment verification (SPV) nodes, which allow lightweight clients to verify transactions without downloading the entire blockchain.
It's important to note that the security of a Merkle tree depends on the assumption that the underlying hash function is collision-resistant. If an attacker can find a collision (two different inputs that produce the same hash), they can construct a valid Merkle proof that is not actually valid.
Finally, it's worth noting that while Merkle trees are an important component of the Bitcoin system, they are just one of many building blocks that work together to create the secure, decentralized, and trustless system that we know and use today.

116> what are some other nearby structures in the domain of computer science?

In the domain of computer science, there are several other data structures and algorithms that are related to Merkle trees. Some of them include:

Hash trees: These are similar to Merkle trees but use a different type of hashing algorithm. Hash trees are commonly used in file systems to verify the integrity of files.
Bloom filters: Bloom filters are probabilistic data structures that are used to test whether an element is a member of a set. Bloom filters are used in Bitcoin to reduce the amount of data that nodes need to store about the blockchain.
Red-black trees: Red-black trees are self-balancing binary search trees that are used to store sorted data efficiently. They are commonly used in databases and file systems.
AVL trees: AVL trees are another type of self-balancing binary search tree. They guarantee that the tree is always balanced, which means that the worst-case time complexity of operations is logarithmic.
B-trees: B-trees are balanced trees that are used to store large amounts of data on disk. They are commonly used in databases and file systems to provide efficient access to large datasets.

Each of these data structures has its own strengths and weaknesses, and they are all useful in different contexts.

117> what makes merkle trees uniquely suited towards solving this problem

Merkle trees are well-suited to solving this problem for several reasons. First, they are a space-efficient way to verify the integrity of large data sets. They do not require that the entire data set be stored or transferred, but rather only the small set of hash values that represent the entire data set. This makes them very useful for applications such as file systems or data storage, where it is important to verify that data has not been corrupted or tampered with.

Second, Merkle trees allow for efficient verification of specific parts of the data set. By only requiring a subset of the hash values to be provided, it is possible to verify the integrity of a specific portion of the data set without having to calculate the hash values for the entire set. This is particularly useful in the context of the Bitcoin blockchain, where it is often only necessary to verify a single transaction or block, rather than the entire blockchain.

Finally, Merkle trees are computationally efficient to construct and verify. The time required to construct and verify a Merkle tree is proportional to the logarithm of the size of the data set, which makes them much faster to work with than other hash-based data structures such as hash tables.

118> who was Merkle?

Ralph Merkle is a computer scientist and cryptographer who is known for his work in public key cryptography, hash trees, and other cryptographic innovations. He is a recipient of the IEEE Richard W. Hamming Medal and is a fellow of the International Association for Cryptologic Research.

119> thanks!

You're welcome! Let me know if you have any more questions.

Dependencies

~95MB
~874K SLoC