Now that we have some tools for evaluating hash functions for table lookup, let's look at several common hashes. I usually define "ub4" as unsigned long int; it's supposed to be an unsigned 4-byte integer type.

The **additive hash**, seen in commercial code:

ub4 additive( char *key, size_t len, ub4 prime) { ub4 hash; size_t i; for (hash=(ub4)len, i=0; i<len; ++i) hash = hash+key[i]; return (hash % prime); }This takes 5

The **rotating hash**, also seen in commercial code:

ub4 rotating( char *key, size_t len, ub4 prime) { ub4 hash; size_t i; for (hash=(ub4)len, i=0; i<len; ++i) hash = (hash<<5)^(hash>>27)^key[i]; return (hash % prime); }This is the same as the additive hash, except it has a mixing step (a barrelshift by 5) and the combining step is exclusive-or instead of addition. Speed is 8

char pearson( char *key, size_t len, char tab[256]) { char hash; size_t i; for (i=0, hash=0; i<len; ++i) hash=tab[hash^key[i]]; return (hash); }This preinitializes

**CRC hashing**
^{Knuth6}

ub4 crc( char *key, size_t len, ub4 mask, ub4 tab[256]) { ub4 hash; size_t i; for (hash=(ub4)len, i=0; i<len; ++i) hash = (hash<<8)^tab[(hash>>24)^key[i]]; return (hash & mask); }This takes 9

**Universal hashing**
^{UNI}:

ub4 universal( char *key, size_t len, ub4 mask, ub4 tab[MAXBITS]) { ub4 hash; size_t i; for (hash=0, i=0; i<(len<<3); i+=8) { register char k = key[i>>3]; if (k&0x01) hash ^= tab[i+0]; if (k&0x02) hash ^= tab[i+1]; if (k&0x04) hash ^= tab[i+2]; if (k&0x08) hash ^= tab[i+3]; if (k&0x10) hash ^= tab[i+4]; if (k&0x20) hash ^= tab[i+5]; if (k&0x40) hash ^= tab[i+6]; if (k&0x80) hash ^= tab[i+7]; } return (hash &= mask); }This hash takes 52

for (i=0; i<MAXBYTES; ++i) { for (j=0; j<256; ++j) { int val; for (k=0, val=0; k<8; ++k) { if (j&(1<<k)) val ^= tab[8*i+k]; } tab2[i][j] = val; } }then implement the hash as:

ub4 universal2( char *key, size_t len, ub4 mask, ub4 tab2[MAXBYTES][256]) { ub4 hash; size_t i; for (hash=0, i=0; i<len; ++i) hash ^= tab2[i][key[i]]; return (hash &= mask); }then it takes 10n+3 instructions.

The new hash presented in this article: This takes 6*n*+35
instructions. It produces a full 32-bit result with no funnels or
strong 2-bit characteristics. The 64-bit hash
produces a 64-bit result in 5*n*+41 instructions, also with no
funnels or strong 2-bit characteristics.

**MD4** ^{MD4}, a fast
hash designed for cryptography: MD4 takes 420 instructions to hash a
block of 64 bytes. Then there is the task of fitting bytes into
registers, which required an additional 3*n* instructions per
byte for our new hash. Using the method in the new hash for putting
bytes into registers, MD4 should take 9.5*n*+230 instructions to
hash n bytes into a full 64-byte result. Although MD4 has recently
been shown to be too weak for cryptography ^{Dobbertin}, it is more than strong enough
for table lookup. It has no funnels, and no noticable characteristics
of any size.

The table statistics compares all these hash functions.

- NAME
- is the name of the hash.
- SPEED
- is the speed of the hash, measured in instructions required to produce a hash value for a table with about 1000 values.
- FUNNEL-15
- is the worst funnel when mapping 15-byte keys into a 1-byte result.
- FUNNEL-100
- is the worst funnel when mapping 100-byte keys into a 4-byte result.
- COLLIDE-32
- is the number of collisions found when a
dictionary of 38470 English words
^{ispell}was hashed into a 4-byte value. (0.2 collisions are expected.) - SIZE-1000
- is the smallest reasonable hash table size greater than 1000.
- COLLIDE-1000
- is a chi
^{2}measure^{Knuth}of how well the hash did at mapping the 38470-word dictionary into the SIZE-1000 table. (A chi^{2}measure greater than +3 is significantly worse than a random mapping; less than -3 is significantly better than a random mapping; in between is just random fluctuations.)

NAME | SPEED | FUNNEL-15 | FUNNEL-100 | COLLIDE-32 | SIZE-1000 | COLLIDE-1000 |
---|---|---|---|---|---|---|

Additive | 5n+3 | 15 into 2 | 100 into 2 | 37006 | 1009 | +806.02 |

Rotating | 8n+3 | 4 into 1 | 25 into 1 | 17 | 1009 | +1.24 |

Pearson | 12n+5 | none | none | 0 | 1024 | +1.65 |

CRC | 9n+3 | 15 into 14 | 15 into 14 | 1 | 1024 | +0.16 |

Universal | 10n+3 | 4 into 3 | 50 into 28 | 0 | 1024 | +0.20 |

New 32-bit | 6n+35 | none | none | 0 | 1024 | +0.33 |

New 64-bit | 5n+41 | none | none | 0 | 1024 | +1.42 |

MD4 | 9.6n+230 | none | none | 1 | 1024 | +0.73 |

Funnels are a flaw that can be hit by some reasonable sets of keys, not by every reasonable set of keys. Only the additive and rotating hashes did poorly enough on the English dictionary for their test results to be noticably bad. This set of keys apparently did not exercise the funnels in the CRC or the universal hash very much.

- Dobbertin
- Cryptanalysis of MD4, 3rd Fast Software Encryption Workshop
- Golomb
- Shift Register Sequences
- ispell.txt
- a list of English words that comes with GNU EMACS
- Knuth
- The Art of Computer Programming, chapter 3
- Knuth6
- Knuth, D.
*The Art of Computer Programming*, Volume 3: Sorting and Searching, Chapter 6.4. Addison Wesley, 1973 - UNI
- Carter, L. and Wegman, M. "Universal Hash Functions,"
*J. of Computer and System Sciences*18, 143-154 (1979)

Hash Functions for Hash Table Lookup

The Birthday Paradox

Table of Contents