Cryptographic Hashing

Cryptographic hashing is a method of producing a hash code for a given object. Since the result of a hash function is the same size regardless of the size of the input, calling it "cyrptographic" hashing is somewhat of a misnomer - the original object can't be recovered from its hash code. Cryptographic hashing's main use is for data integrity - as we will see later, if an object's hash code matches what it "should" be, then it is likely that it has not been tampered with.

Contents

  • Cryptographic hash functions
  • How it Works
  • An Application
  • References
  • See Also
  • External Links
  • Cryptographic hash functions

    A cryptographic hash function is a "special" type of hash function that adds some additional properties:

  • One-way property: Given a hash code, it is mathematically infeasible to find an object with that hash code. [1]
  • Weak collision property: Given an object, it is mathematically infeasible to find an object with the same hash code. [1]
  • Strong collision property: It is mathematically infeasible to find two objects with the same hash code. [1]
  • Mathematically infeasible means that there is no efficient way to do it; the only way is by brute force. So, given some specific hash code, you could try to find an object with that hash code, but it would take so long that it wouldn't be worthwhile.

    MD5 was a widely used cryptographic hash function; it has since been replaced by the SHA family. [2]

    How it Works

    NOTE: In this section, the exact details on the algorithm are taken from [3].
    Here is an example of how the hash code of a message is calculated, using the SHA-1 algorithm (since it's easier for me to understand than MD5):

    1. Padding: Add a 1 to the end of the message (in binary), then pad the message with 0's until its length is 64 bits short of being a multiple of 512. Then, add its length (the length before padding) onto the end of the message (this is 2 words, so the length of the padded message is a multiple of 512 bits). Now, the
    2. Functions and constants: In the SHA-1 algorithm, a series of 80 functions, f(t, B, C, D), is used, each operating on 3 words (in these functions, AND means the bitwise AND across the 2 input words, and so forth).
      If t is between 0 and 19 inclusive, then f(t, B, C, D) = (B AND C) OR ((NOT B) AND D)
      If t is between 20 and 39, or 60 and 79 inclusive, then f(t, B, C, D) = B XOR C XOR D
      Otherwise (t is between 40 and 59 inclusive), f(t, B, C, D) = (B AND C) OR (B AND D) OR (C AND D)
      There are also 80 constant words, K(t), used.
      If t is between 0 and 19 inclusive, then K(t) = 5A827999 (in hex)
      If t is between 20 and 39, then K(t) = 6ED9EBA1
      If t is between 40 and 59, then K(t) = 8F1BBCDC
      Otherwise, K(t) = CA62C1D6
    3. Determine the hash code: The following variables are used in calculating the hash code:
      • A buffer, ABCDE, containing 5 words
      • A buffer, H0 H1 H2 H3 H4, containing 5 words
      • An 80-word sequence, W(0) W(1) et cetera
      • A single-word buffer, TEMP
      • A series of 16-word blocks, M(1) ... M(n), containing the original message. This is why we needed to pad the message to a multiple of 512 bits.
      Initially, the H's have been initialized as follows:
      • H0 = 67452301
      • H1 = EFCDAB89
      • H2 = 98BADCFE
      • H3 = 10325476
      • H4 = C3D2E1F0
      Now, we process each M from 1 to n (in order). For each M(i), do the following:
      1. Set W(k) equal to the kth word of M(i), starting from the left
      2. For t from 16 to 79, W(t) = S^1(W(t-3) XOR W(t-8) XOR W(t-14) XOR W(t-16)). S^j is the "circular left shift by j" operation - shift the input j bits to the left, then tack the first j bits onto the right.
      3. Let A = H0, B = H1 and so on.
      4. For t from 0 to 79, do the following:
        • TEMP = S^5(A) + f(t, B, C, D) + E + W(t) + K(t) (the + operator here means "add the words as if they were numbers, and discard the overflow")
        • E = D
        • D = C
        • C = S^30(B)
        • B = A
        • A = TEMP
      5. H0 = H0 + A, H1 = H1 + B and so on
      After all of the M's have been processed, the message digest is H0 H1 H2 H3 H4. Notice that the message digest is 5*32 = 160 bits, no matter how long the input message is.
    This is just one example of a cryptographic hash function. Any hash function can be used, so long as it has the 3 properties given above.

    An Application

    Let's suppose I want to install
    this Mah-Jong game on my computer. If I was extremely paranoid about security, then I might worry about the possibility that the server has been hacked, and the file has been tampered with. Fortunately, the page's author has provded the MD5 hash for the file (mj.zip). All I need to do is run the md5 program on this file (which I can do br dragging and dropping mj.zip onto the shortcut to md5), and check the resulting md5 hash against the one posted on the page from which I downloaded it. Since they match, I can be relatively certain that mj.zip actually does contain my Mah-Jong game, and not a virus that will turn my computer into a smouldering heap of silicon. (Let's ignore the possibility that both the file and its MD5 hash on the page were hacked...)

    References

    [1]: Farmer, William (2007). Overview of Cryptography. Sfwr Eng 4C03 Course Notes.
    [2]: SHA hash functions. (2007, April 4). In Wikipedia, The Free Encyclopedia. Retrieved 22:13, March 26, 2007, from http://en.wikipedia.org/w/index.php?title=SHA_hash_functions&oldid=118024263.
    [3]: Eastlake III, D. and Jones, P. (2001). RFC 3174 US Secure Hash Algorithm 1 (SHA1). Retrieved March 26, 2007 from http://tools.ietf.org/html/rfc3174.

    See Also

    External Links

    Author: James Leroux
    Last revision: April 6, 2007