Cryptography - Hashing Algorithms

Posted May 18, 2018 Updated Mar 25, 2024

By Grace L

13 min read

[toc]

Hashing Algorithms

Hashing:

verify integrity with hashing.
an algorithm performed on data to produce a number called a hash (checksum).

Hash:

conjunction with a message digest, public key cryptosystems can provide digital signature capability.
Message digests are summaries of a message’s content (not unlike a file checksum) produced by a hashing algorithm.

Hash functions:

take a potentially long message
and generate a unique output value derived (message digest) from the content of the message.
Message digests can be generated by the sender of a message and transmitted to the recipient along with the full message for two reasons.
First
- the recipient use the same hash function to recompute the message digest from the full message,
- compare the computed message digest to the transmitted one to ensure that the message sent by the originator is the same one received by the recipient.
- If the message digests do not match, that means the message was somehow modified while in transit.
Second
- the message digest can be used to implement a digital signature algorithm “Digital Signatures”.

message digest: hash, hash value, hash total, CRC, fingerprint, check-sum, and digital ID.

In most cases, a message digest is 128 bits or larger.
However, a single-digit value can be used to perform the function of parity, a low-level or single-digit checksum value used to provide a single individual point of verification.
In most cases, the longer the message digest, the more reliable its verification of integrity.

Common uses:

store computer passwords.
ensure message integrity.
- use to verify that data is not modified, tampered, or corrupted. verify has maintained integrity.
created twice and compared
If hash is the same, the file has retained integrity.
Important step before moving any installation packages from a test environment to production.

Key point

key point: hash will always be the same for the same data.
one-way
- not reversible,
- extremely difficult or impossible, to derive a message from an ideal hash function.
Variable-length input produces fixed-length output
- hash two characters or two million, the output hash size is the same.
- The input can be of any length.
- The output has a fixed length.
few or no collisions
- two messages will not produce the same hash value.
- Collision:
  - 2 different inputs to a hashing algorithm produce the same output.
- Modern hashing algorithms are designed to make this less likely.
- However, basic logic should tell you that if a given hash has a 160-bit output (like SHA1) and you put in 2160 +1 separate inputs, the last one must have a collision with one of the preceding inputs.
- Don’t be too concerned; 2160 is a very large number:1.4615016373309029182036848327163e+48.

common hashing algorithms in use today:

Message Digest (MD)
- Message Digest 2 (MD2)
- Message Digest 5 (MD5)
Secure Hash Algorithm (SHA)
- SHA-1
- SHA-2 (-224, -256, -384, -512)
- SHA-3
Hashed Message Authentication Code (HMAC)
- a special subset of hashing technology.
- can provide integrity simultaneously with authentication.

Message Digest (MD)

Message Digest 2 (MD2) `128-bit message digest`

developed by Ronald Rivest (the same Rivest of Rivest, Shamir, and Adleman fame) in 1989 to provide a secure hash function for 8-bit processors.
MD2 pads the message so that its length is a multiple of 16 bytes.
It then computes a 16-byte checksum and appends it to the end of the message.
A 128-bit message digest is then generated by using the entire original message along with the appended checksum.

Cryptanalytic attacks exist against the MD2 algorithm.

Nathalie Rogier and Pascal Chauvaud discovered that if the checksum is not appended to the message before digest computation, collisions may occur.
Frederic Mueller later proved that MD2 is not a one-way function. Therefore, it should no longer be used.

Message Digest 4 `128-bit message digest`

In 1990, Rivest enhanced MD2 to support 32-bit processors and increase the level of security.

It first pads the message to ensure that the message length is 64 bits smaller than a multiple of 512 bits.
- example: a 16-bit message, padded with 432 additional bits of data to make it 448 bits, then is 64 bits smaller than a 512-bit message.

MD4 algorithm then processes 512-bit blocks of the message in 3 rounds of computation.

The final output is a 128-bit message digest.

The MD2, MD4, and MD5 algorithms are no longer accepted as suitable hashing functions.

However, the details of the algorithms may still appear on the CISSP exam.

Several mathematicians have published papers documenting flaws in the full version of MD4:

Hans Dobbertin published a paper in 1996 outlining how a modern PC could be used to find collisions for MD4 message digests in less than one minute. Then MD4 is no longer a secure hashing algorithm, the use should be avoided if at all possible.

Message Digest 5 (MD5) `128 bit hash`

1991, Rivest released message digest algorithm, MD5.
the newest version of the algorithm.
MD5 has been in use since 1992.
Experts discovered significant vulnerabilities in MD5 in 2004 and later years.
as processing power of computers increased, it became easier and easier to exploit these vulnerabilities.
Security experts now consider it cracked and discourage its use.

MD5

produces a 128-bit hash
more complex than its predecessors, offers greater security.
It also processes 512-bit blocks of the message.
- MD5 has the same padding requirements as MD4.
- the message length must be 64 bits less than a multiple of 512 bits.
- a common hashing algorithm, produces 128-bit hash.
- not a stream of 1s and 0s.
- MD5 hash is displayed as 32 hexadecimal characters instead of 128 bits.
- Hexadecimal characters are composed of 4 bits and use the numbers 0 through 9 and the characters a through f.
but it uses 4 distinct rounds of computation to produce a digest of the same length as the MD2 and MD4 algorithms (128 bits) (16byte) Hash 值.
- 将可变长度的数据集映射为固定长度的数据集。
- (将消息分割成 512-bit 数据块。填补消息到末尾以确保其长度能除以512)。
- 通过 MD5 算法处理，其结果将是一个128位的散列值。
- 使用MD5后，生成的散列值通常是32位16进制的数字。
- plaintext = message，加密后所生成的 Hash = (message) digest

优点:

MD5 implements additional security features,
but reduce the speed of message digest production significantly. 生成速度快且易于实现?

biggest weakness:

does not have strong collision resistance
容易被暴力攻击和字典攻击(使用彩虹表快速地搜索已知 Hash 对应的原数据).
does not have strong collision resistance
- 没有避免 Hash 碰撞,
- subject to collisions, preventing its use for ensuring message integrity.
- Arjen Lenstra and others demonstrated in 2005, it is possible to create two digital certificates from different public keys that have the same MD5 hash.
- November 2007, researchers published their findings of the ability to have two entirely different Win32 executables with different functionality but the same MD5 hash. This discovery has obvious implications for the development of malware.
如果仍使用 MD5，可以加 salt 进一步保证安全性
no longer recommended
- SHA (1 or 2) are the recommended alternatives.

Hash of Variable Length (HAVAL)

a modification of MD5.
HAVAL uses 1,024-bit blocks
produces hash values of 128, 160, 192, 224, and 256 bits.

Secure Hash Algorithm (SHA)

government standard hash functions

developed by NIST NSA
originally named Keccak and designed by Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche.
the Secure Hash Standard (SHS), also known as Federal Information Processing Standard (FIPS) 180-2and180-3.
- Specified official government publication

SHA Secure Hash Algorithm

designed to ensure the integrity of a message.
- SHA 碰撞的几率小于 MD5。碰撞的发生非常罕见。
生成的 Hash 安全性比 MD5 更强, 之外其它都与 MD5 非常类似
- 通常越长的 Hash 越难破解，这是核心思想。
several variations of SHA:
- SHA-0, SHA-1, SHA- 2, and SHA-3:

The cryptographic community generally considers the SHA-2 algorithms secure,but they theoretically suffer from the same weakness as the SHA-1 algorithm.

In 2012, the federal government announced the selection of the Keccak algorithm as the SHA-3 standard. However, the SHA-3 standard remains in draft form and some technical details still require finalization.
Observers expect that, once NIST finalizes SHA-3, SHA-2 will remain an accepted part of NIST’s Secure Hash Standard (SHS) until someone demonstrates an effective practical attack against SHA-2.
SHA-2 and SHA-3: currently approved for use.
SHA-1 has been discontinued.

SHA-1 `160 bit hash, 512-bit blocks`

1993, was designed as the algorithm to be used for secure hashing in the U.S. Digital Signature Standard (DSS).
takes input of virtually any length
- (in reality, there is an upper bound of approximately 2,097,152 terabytes on the algorithm).
similar to the MD5 hash except that it creates 160-bit hashes instead of 128-bit hashes.
if the message length is not a multiple of 512.
- pads the message with additional data
- Until length reaches the next highest multiple of 512.
a more secure hash functions at that time,
- but it has been found vulnerable to a collision attack,
- no longer approved for use by government agencies.

SHA-2 `224,256,384,512 bit hash`

SHA-2 improved SHA-1 to overcome potential weaknesses.

includes four versions.
SHA-224, SHA-256, SHA-384, and SHA-512.
- SHA-256, 256-bit message digest, 512-bit block size.
- SHA-224, 224-bit message digest, 512-bit block size.
- truncated version of the SHA-256.
- SHA-384, 384-bit message digest, 1,024-bit block size.
- truncated version of the SHA-512
- SHA-512, 512-bit message digest, 1,024-bit block size.
similar to SHA-1
- accept input of less than 264 bits and reduces that input to a hash.
- The SHA-2 produces a hash length equal to the number after SHA, so SHA-256 produces a digest of 256 bits.
The SHA-2 series became more common after SHA-1 was shown to be potentially

SHA-3 `224,256,384,512 bit hash`

SHA-3 (Keccak) is an alternative to SHA-2.

The U.S. National Security Agency (NSA) created SHA-1 and SHA-2.
SHA-3 was created outside of the NSA and was selected in a non-NSA public competition.
It can create hashes of the same size as SHA-2 (224 bits, 256 bits, 384 bits, and 512 bits).

Hashed Message Authentication Code (HMAC)

implements a partial digital signature
a fixed-length string of bits similar to other hashing algorithms such as MD5 and SHA-1 (known as HMAC-MD5 and HMAC-SHA1).

HMAC uses a shared secret key

to add some randomness to the result
- HMAC can be combined with any standard message digest generation algorithm, like SHA-2, by using a shared secret key.
only the sender and receiver know the secret key.
- can generate / verify the digital signature/hash.
it guarantees the integrity of a message during transmission,
- If the recipient decrypts the message digest but cannot successfully compare it to a message digest generated from the plaintext message
  - that means the message was altered in transit.
does not provide for nonrepudiation.
- Because HMAC relies on a shared secret key
- it does not provide any nonrepudiation functionality (as previously mentioned).
- However, it operates in a more efficient manner than the digital signature standard described in the following section and may be suitable for applications in which symmetric key cryptography is appropriate.
- it represents a halfway point between unencrypted use of a message digest algorithm and computationally expensive digital signature algorithms based on public key cryptography.

Example:

one server is sending a message to another server using HMAC- MD5.
server
- creating a hash of a message with MD5
- uses a secret key to complete another calculation on the hash.
- then sends the message and the HMAC-MD5 hash to the second server.
The second server
- performs the same calculations
- and compares the received HMAC-MD5 hash with its result.
  - if the two hashes are the same
    - the message retained integrity,
  - if different
    - the message lost integrity.

The HMAC provides both integrity and authenticity of messages.

The MD5 portion of the hash provides integrity just as MD5 does.
However, because only the server and receiver know the secret key,
- if the receiver can calculate the same HMAC-MD5 hash as the sender,
- it knows that the sender used the same key.
If an attacker was trying to impersonate the sender,
- the message wouldn’t pass this authenticity check because the attacker wouldn’t have the secret key.

Internet Protocol security (IPsec) and Transport Layer Security (TLS) often use a version of HMAC such as HMAC-MD5 and HMAC-SHA1.

HMAC example

before:

attacker change the message and the hash
- calculate the hash on the modified message and replace the original hash with the modified hash.
- Hash created on Lisa’s computer
  - D9B93C99B62646ABD06C887039053F56
- Modified hash by attacker
  - 564294439E1617F5628A3E3EB75643FE
- Hash created for modified message on Bart’s computer
  - 564294439E1617F5628A3E3EB75643FE
The calculated hash on the modified message would be the same as the received hash.
This erroneously indicates that the message maintained integrity.

HMAC helps solve this problem.

With HMAC, both Lisa and Bart’s computers would
- know the same secret key
- and use it to create an HMAC-MD5 hash instead of just an MD5 hash.
Lisa is still sending the same message.
- The MD5 hash is D9B93C99B62646ABD06C887039053F56.
after applying the HMAC secret key
- the HMAC-MD5 hash:733C70A54A13744D5C2C9C4BA3B15034.
attacker can modify the message in transit just as before.
- However, the attacker doesn’t know the secret key, can’t calculate the HMAC hash.
Bart’s computer calculates the HMAC-MD5 hash on the received message using the shared secret key.
- It then compares the calculated hash with the hash received from Lisa:
  - HMAC-MD5 hash created on Lisa’s computer:733C70A54A13744D5C2C9C4BA3B15034
  - HMAC-MD5 hash created on Bart’s computer:1B4FF0F6C04434BF97F1E3DDD4B6C137
- hashes are different and the message has lost integrity.
- If the messages weren’t modified, the HMAC-MD5 hash would be the same.

HMAC-MD5 `128 bit hash`

HMAC-SHA-1 `160 bit hash`

13Cryptography

cryptography

This post is licensed under CC BY 4.0 by the author.

Hashing Algorithms

Message Digest (MD)

Message Digest 2 (MD2) 128-bit message digest

Message Digest 4 128-bit message digest

Message Digest 5 (MD5) 128 bit hash

Hash of Variable Length (HAVAL)

Secure Hash Algorithm (SHA)

SHA-1 160 bit hash, 512-bit blocks

SHA-2 224,256,384,512 bit hash

SHA-3 224,256,384,512 bit hash