Cryptographic Hash Function Explained: A Beginner’s Guide

Your fingerprints help identify you uniquely. Your fingerprints are made of multiple lines that run in a fashion that is unique for every individual on this planet. A slight change in the fingerprint will result in altogether a new person. Well, hash values can be thought of as fingerprints for files and it’s time for us to understand in detail about them.

Hash values are generated by a mathematical function called the one-way hash function. A one-way hash is a function that takes a variable-length string (a message) and produces a fixed-length value called a hash value.  Furthermore, a one-way hash function is designed in such a way that it is hard to reverse the process, that is, to find a string that hashes to a given value (hence the name one-way.) A good hash function also makes it hard to find two strings that would produce the same hash value. All modern hash algorithms produce hash values of 128 bits and higher.

For example - If I want to send across a message to my friend and I want to ensure that my friend gets to know if any modification has not taken place, I would send the message and then the hash value of the message to him/her. My friend would open the message, calculate the hash value and compare it with the hash value sent by me. If they match, no modification has taken place. In terms of security, the one-way hash function is one of the best methods to ensure the integrity of the data. Most software companies publish the hash value of their executable files so that when you receive the software after purchase even from a third party, you can easily check the genuineness of the software.

There are two important aspects of the hash function:

1. Irreversible - If you have the hash value, it is impossible for you to reverse engineer it to find the actual message, even if you use quantum computing. That’s one of the most powerful features of this function. The hashing algorithm is not a secret—it is publicly known. The secrecy of the one-way hashing function is its “one-wayness.” The function is run in only one direction, not the other direction. 

2. A slight change in input leads to a big change in the output :

To help understand this, study the following example :

Message 1: I cleared CISSP today
Hash Value - 9584bab89a878b8fea64e865cef1f34e ( MD5 hash)

Message 2: I cleared CISSP today. [ Just added a full stop]
Hash Value - e8d62f7bfdf2a2ddd7288b831fca48c9 (Md5 hash)

You can try out yourself by generating the hash value using the link -

Notice the change in the hash value when just a full stop has been added in the message. It's a nano change in the overall message, but the hash value has completely changed as if the message was completely changed and there lies the beauty of this function.

How does it help? Well, the deduction goes for a toss.  The hacker would not be able to deduce the second message if it gets holds of the 1st message and its hash value. This is known as the avalanche effect.

But, what happens if the hash value of two different messages comes out to be the same? Because hash functions have infinite input length and a predefined output length, there is inevitably going to be the possibility of two different inputs that produce the same output hash. If two separate inputs produce the same hash output, it is called a collision. This collision can then be exploited by any application that compares two hashes together – such as password hashes, file integrity checks, etc.

The odds of a collision are of course very low, especially so for functions with very large output sizes. However, as available computational power increases, the ability to brute force hash collisions becomes more and more feasible.

How does a collision help a hacker? Well, if two messages can result in the same hash value, it would be easy for the attacker to replace the message file without you suspecting that there is any change that has happened. This is also utilised in an attack known as the birthday attack ( explained in the next blogpost)

Cryptographic hash functions have many information security applications, notably in digital signatures, message authentication codes (MACs), and other forms of authentication. They can also be used as ordinary hash functions, to index data in hash tables, for fingerprinting, to detect duplicate data or uniquely identify files, and as checksums to detect accidental data corruption.

There are several different classes of hash functions. Here are some of the most commonly used:

  • Secure Hashing Algorithm (SHA-2 and SHA-3)
  • RACE Integrity Primitives Evaluation Message Digest (RIPEMD)
  • Message-Digest Algorithm 5 (MD5)

Each of these classes of the hash function may contain several different algorithms. For example, SHA-2 is a family of hash functions that includes SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, and SHA-512/256.

While all of these hash functions are similar, they differ slightly in the way the algorithm creates a digest, or output, from a given input. They also differ in the fixed length of the digest they produce.


You may also like to read...

Identification, Authentication, Authorization, and Accountability

Access Control Models - DAC, MAC, RBAC , Rule Based & ABAC

How to Pass SSCP Exam in the First Attempt

Understanding Security Modes - Dedicated , System high, Compartmented , Multilevel

Cloud Computing - The Logical Model