Mutant pointers

Also known as hash pointers, these special guys store the cryptographic hash of the information along with pointing to that information. If you are not sure what pointers are, you can find something great to learn here and there. Come back only when you have befriended pointers well.

So, while a pointer gives us a way to retrieve information from a location, a hash pointer (our mutant) helps us retrieve the information as well as verify that this information hasn't changed (through the cryptographic hash). Now we are ready to look into ... (behold the celebrity for this sub-chapter) ... the blockchain.

Blockchain

Blockchain is a data structure created using hash pointers. Have you heard of linked lists? Well, now you have. So a block-chain is essentially (not exactly) a linked list with hash pointers instead of pointers. Imagine a reverse linked list with pointers pointing to the previous node instead of the next node; similarly in a blockchain, each block consists of some data and a hash pointer to the previous block. So along with telling us where the previous block is, the pointer also lets us verify that the information stored in the previous block has not changed. We store the head of this block-chain: which is a hash pointer to the most recent block. If you are having thoughts like Why all this fuss?, then go back here and recall how a blockchain helped us keep a tamper-evident ledger beautifully. Conclusion: if we store the hash pointer at the head of this block-chain in a super-secret place where no one can change it (we only want to be able to read it to know the contents of the blockchain, no write permission required), then we are good to go.

Just by remembering the single top-level hash pointer, we are able to remember a tamper-evident hash of the entire block-chain. The block at the beginning of the block chain is called the genesis block (origin of the block-chain).

Here's an inquisitive thought: Do the blockchain and the Merkle-Damgard transform seem to be similar in some ways? Ponder over it for a while. (Hint: Yes they are)

Merkle tree

So the block-chain as we know up to now is like a linked list with hash pointers instead of normal pointers, pointing to the previous block in the chain. Now let's look at another data structure, again derived from a common data structure, the binary tree, read about them here or here. So a binary tree with hash pointers instead of normal pointers is a Merkle tree (this is fun, right?), named after Ralph Merkle, the creator.

This part is crucial to understand now, so follow along carefully. It is presumed that you are familiar with binary trees and the associated terms and concepts like leaves, traversals, etc. Let's try to use this tree for something good, to understand Why Merkle tree?.

Consider a situation where we have some blocks of data. Now imagine a tree with these blocks as its leaves. Now group these blocks into pairs of two (keep the last one sad and single if total number of blocks is odd), and for each pair, we create a data block above the two blocks in that pair, with hash pointers (left and right) to both these blocks. These new data blocks will become another level of the tree. Now we repeat this until we are left with just one node at the top of the tree, and we call this the root. Look at the image below to realize this.

Browser could not display this SVG.

Now as in the case of the linked list, we only have to remember the hash pointer at the root, and we can lookup the whole data with that, along with verifying the integrity, because if someone tampers with the actual data at the leaves, the changes will eventually propagate to the root which the tamperer will fail to change. So any attempt to do so will be detected by just storing the root hash pointer.

Immediately now a question arises. Why are we looking at a Merkle tree when we have the blockchain already? The advantage we have in a Merkle tree over a simple block-chain is that it makes it easy for us to prove that a certain data block is a member of the tree.

Think about how the membership proof is more advantageous in Merkle trees than the block-chain. Give it some thought, then carry on below.

Membership proof in Merkle trees

If you haven't concluded already, look at this: when given a data block, if we want to find whether it belongs to the data structure that we have or not, what do we need to see?

  • Block-chain: We have to see the chain from our head hash pointer that we remember, all the way up to the data block that that we are checking for membership. This can be O(n) (read more or nosedive into it) in the worst case where n is the number of data blocks in the whole block-chain.
  • Merkle tree: But in case of a Merkle tree, we only need to look at O(log(n)) blocks, i.e. those blocks that are along the path when we traverse from the root to the data block to check for membership.

A sorted Merkle tree is more useful in general, because we have the leaves sorted from left to right in an order agreed upon in advance (lexicographical or numerical or whatever). What is the advantage of sorting the leaves? Well, follow along ...

We can very easily show that a particular data block D is not in the Merkle tree by showing a path to data block L that is just before where D would have been and data block R just after where D would have been if it was a member of the tree. If L and R are consecutive in the tree, we can say for sure that data block D does not exist in the tree, and this serves as a non-membership proof in logarithmic time and space.

Other data structures

Now we are familiar with hash pointers and their application in two basic data structures, the block-chain and the Merkle tree. But in general, hash pointers can be used for any acyclic pointer data structure. So any directed acyclic graph with hash pointers is a great data structure to verify membership of elements as well as carry out other operations that the graph data structure offers inherently.

Think about why being acyclic is a compulsion for data structures with hash pointers.

Using hash pointers is thus a very important and common thing across implementations of distributed data structures and algorithms. In the next sub-chapter we will look at digital signatures and understand how they help us.

results matching ""

    No results matching ""