Blockchain 101 – Part 2: Hashing
Part 1: What is a Blockchain?
Part 3: The Immutable Ledger
Hello there and welcome back to this new part of the Blockchain 101 serie. Last time, we talked about what a blockchain is, the concept of immutability and how blocks (which hold the records) are linked to each other using the block hash and previous_hash properties.
Today we are going to talk about a pretty serious topic: The Hashing.
Hashing plays a key role in the whole concept of blockchains and it is not limited to that. For instance, it is also something which is part of the underlying functioning of cryptocurrencies' wallets. This is why is so fundamental to deal with this topic now. I know, this is not the funniest thing to read about.
Today however, I will try to sweeten the pill for you.
The Fingerprint
A hash is the result of applying a hash function/algorithm to anything digital data (a document, an image and other data in general). But exactly what's the point of this? Think about the human fingerprint. Each one of us has a different fingerprint which, usually, is stored in private databases by authorities and it is used to reveal a person associated identity. This can be done because the relationship between a person and its fingerprint is unique.
Well, the hash of digital data is exactly the same concept of a fingerprint. When a hashing algorithm is applied to a document, the output will be a unique alphanumeric string representing the document. There are many different hashing algorithms, you can find a quick list here. The one we are interested in is the SHA-256, the cryptographic hashing function used by the Bitcoin protocol. The SHA-256 algorithm generates a 64 hexadecimal string (32 bytes long).
The hexadecimal numeral system is a way to express data (numbers, ascii characters and computer binary data in general) more efficiently. It is a numeral system constituted by 16 symbols. It uses decimal numbers up to 9 and letters to represents numbers between 10 and 15 (A to F).
Why this numeral system? Because is the best in representing binary data among all the other systems. You can read more about it here. If this is too hard to grasp, just imagine the hash as a string of 64 characters.
Each time you input something into the algorithm you get an output similar to the one in the following image.
If you'd like to experiment with the SHA-256 algorithm I suggest you to visit this website. As an exercise, try to write different things as input to the algorithm and watch how the result completely changes even when changing one single character in the input string. This is called the avalanche effect, we will discuss about this important concept in a while.
Collisions
Maybe you know, maybe you don't. Human fingerprints, as we said, identify a single subject univocally but...sometimes, it might be the case that two individuals have the same fingerprint. Right, unbelievable! But it can happen, although very unlikely. The probability of finding two individuals with the same fingerprint is 1 in 64 trillions (~ 0.0000000000000156) which is nearly impossible but, if you know what the birthday paradox is, then you know this is still something worth considering.
As for the human fingerprint, the same might happen to hashing algorithms. In this case we are talking about collisions. A collision happens when two different digital documents generate the same hash. Before the SHA-256 algorithm the SHA-1 hashing family was one of the major algorithm families in web encryption. In 2017 Google publicly broke it demonstrating how to generate a collision. This event radically sped up the adoption of the SHA-256 algorithm.
Collision: two documents
generate the same identical hash
Collisions of course, are very dangerous. If a collision was found, that would invalidate the Bitcoin chain making the protocol unreliable! However, fear not, if you look out for an estimate of the likelihood of a collision, you will find out that all the hard drives ever produced on Earth can't hold enough files to get a likelihood of a collision of even 0.01% for SHA-256. This basically means that you can simply ignore the probability of a collision at all.
This should give you an idea of how fascinating, elegant and well designed the Bitcoin protocol is.
Key Requirements
What are the main characteristics to consider when choosing the right hashing algorithm? As previously mentioned, there are many different hashing algorithms. Given the importance of hashing in the design of a blockchain protocol, it is fundamental to choose wisely.
There are 5 requirements an algorithm must met:
- Being Deterministic (Injectivity property)
- Must withstand collisions
- Being computationally fast
- Being a One-way algorithm
- Being characterized by the avalanche effect.
Let's analyze this properties one by one.
A hashing algorithm is deterministic if, taking a document and applying the algorithm multiple different times, the resulting hash is always the same. Ideally, a hash function should be injective. Injectivity, is a mathematical property for which an original point in the function domain (in our case a digital document/data) always corresponds to a specific unique single point (in our case the resulting hash) in the codomain.
Ok, that was boring. Let's move on. As I already explained, a hashing algorithm should avoid collisions in order to be secure. In our example, I showed that the SHA-256 algorithm is a very robust solution when dealing with collisions. Furthermore, the computations required to generate a hash should be fast.
Finally, another important property a hashing algorithm must have, is being a one-way algorithm. In simple terms, this means that starting from the input you can obtain the associated hash, but you can not reverse the process. You can't obtain the original input data starting from a given hash.
If you are the smartest guy in the room or if you are a great critical thinker, you may object that I only described 4 out of the 5 requirements I introduced before. Well, good job sherlock! I left the last one separate because the following property is a very important one and it is called...
The Avalanche Effect
So, what exactly the "avalanche effect" means? It means that if you take a document, generate a hash and then you modify that document (for example you change a single bit or a letter) the resulting hash completely changes. There is no way it might look anything similar to the previous one.
The "avalanche effect" name refers to the way a hash algorithm like SHA-256 is implemented and how it works under the hood. Simply put, making a small change causes a serie of consecutive changes, one after another, which result in a completely different hash. This principle is very important when it comes to mining so, keep it in mind as we will encounter it when we will discuss about how mining works.
If you want to go deep into the nitty gritty details of the SHA256 algorithm, I strongly recommend reading this article. (WARNING: you may find this article hard to read if you are a beginner, but I bet you are a plucky person!)
Conclusions
We understood that hashing is really important part when analyzing how a blockchain works. Also, cryptographic algorithms in general play a key role in the design of a protocol like Bitcoin. Unfortunately, it takes a lot of math to understand cryptography thoroughly. If you are really serious about learning all the underlying details of cryptography I suggest you reading this book: Serious Cryptography: A Practical Introduction to Modern Encryption. However, you won't need it in order to follow along with the upcoming parts of this serie about Blockchain.
Here we are, at the end of this very tedious part about hashing algorithms. How do you feel about it?
I know, it was a tough part...however, be relieved, the next one won't be that boring: we will discuss about the concept of Immutable Ledger: what is it, how it works, and how it does relate to the blockchain.
Meanwhile, if you still have questions do not hesitate in contacting me using my email address (you can find it in the footer of the website) or via the contact page.
Also, you can hit me up on Twitter.
If you enjoyed my content, please share! 🙂
Good luck in becoming the next most talented and most wanted blockchain master in the entire DeFi world! 😉