Anyone who is interested in Blockchain or Cryptocurrencies must have heard the term Hash Function, but not everyone understands how they work and why they are so important. So, in this article, I will try to explain the basics of Hash Functions and why they are so extensively used in blockchains.
If you prefer watching videos instead of reading articles, then you can check out this video.
Hash functions take in an input of any length, apply a mathematical function on this input and generate a fixed-length output called the hash digest.
Before diving into their usage for blockchains, we need to first understand that they were not designed specifically for blockchains, rather they gained popularity because of their use in blockchains like Bitcoin, Ethereum, etc.
They were originally designed for Information Security. The most basic use case for that is storing passwords, for logging into any website we enter our username and password, but the password is never stored on any server, so the interesting challenge here is, how will we verify whether the entered password is correct or not, this is where hash functions come into the picture. We hash the password and store the hash digest and verify that instead of the original password.
What are Hash Functions exactly?
A hash function is any deterministic function that takes an input of arbitrary length and produces a fixed-length output. The output of the hash function is referred to as the hash digest. Blockchains rely heavily on hash functions for generating cryptographic keys and hashing the transaction blocks. We can better understand hash functions by understanding their properties.
- Fixed Length Mapping
- Efficiently Computed
- Preimage Resistance
- Collision Resistance
- Avalanche Effect
- Puzzle Friendliness
Fixed Length Mapping
Maybe you are interested: The State of the Merge: An Update on Ethereums Merge to Proof of Stake in 2022 | ConsenSys
For input of any length, the function will always generate a fixed-length output. This property allows us to hash any file whether it is a text document, image or even a video file and get the output of the same length. There are multiple Hashing Functions out there like SHA-256, Keccak-256, etc.
SHA-256 takes in the input of any length and converts it into 256 bytes. So you can literally hash an entire movie into just 256 bytes.
For a given input, the output will always be the same. So, if I hash the word ‘hello’ using the SHA-256 hash function, then I will always get the same output.
SHA3-256(‘hello’) = 3338be694f50c5f338814986cdf0686453a888b84f424d792af4b9202398f392
So, mathematically, we can say for a given ‘X’, Hash(X) will always be the same.
Efficient Computation means that the hashing algorithm should be so efficient that you can compute hashes on an ordinary Laptop or PC using just your CPU cores.
Not all hash functions are cryptographic hash functions, rather only the functions that exhibit the following cryptographic properties can be called cryptographic hash functions.
It means that given the output of the hash function (hash digest), you can not determine the input. So if I hash a message and send it to someone, then even if you get a hold of the hash digest, you will not be able to decrypt what the original message was.
So, mathematically, we can say that given Hash(X), you can not determine X
It means that for two distinct inputs, the output of the hash function should not be the same.
Maybe you are interested: Ethminer/nftgamef.com at master · ethereum-mining/ethminer · GitHub
So, mathematically, we can say that for 2 distinct inputs X1 and X2, Hash(X1) should not be equal to Hash(X2)
It means that for a small change in the input, there will be a significant change in the output of the hash function.
So, if I type the word “blockchains”, I would get a certain hash digest, but if I change it to “blockchain”, the hash digest would change drastically even though I just modified a single character.
SHA3-256(‘blockchains’) = 99cf6497afaa87b8ce79a4a5f4ca90a579773d6770650f0819179309ed846190 SHA3-256(‘blockchain’) = ef7797e13d3a75526946a3bcf00daec9fc9c9c4d51ddc7cc5df888f74dd434d1
So, when I hashed the word blockchain, I got a 256 bytes long hash digest.
Puzzle Friendliness basically means that even if you get hold of the initial 200 bytes, you can not determine the next 56 bytes from it.
With the introduction of hash functions out of the way, now let’s look at how they are used in leading cryptocurrencies.
Hash Functions and Cryptocurrencies
Hash Functions were not designed for cryptocurrencies, but they are very widely used in leading cryptocurrencies, primarily because of the properties that I mentioned above. These properties ensure secure transactions over the blockchain.
Bitcoin uses SHA-256 and RIPEMD160 whereas Ethereum uses the Keccak-256 hash function. They are primarily used for generating public keys and block hashing.
Block hashing is a core concept of Bitcoin mining. In this process, a block of unconfirmed transactions is fed to a hash function and a hash digest is generated. The miner uses this hash digest and adds some input from his/her end to generate an output that contains a certain amount of leading zeroes, currently the number of leading zeroes is 20. Generating these leading zeroes requires massive computational power and hence mining Bitcoin using the Proof-Of-Work mechanism is very costly and consumes large amounts of electricity.