Now that we have some tools for evaluating hash functions for table lookup, let's look at several common hashes. I usually define "ub4" as unsigned long int; it's supposed to be an unsigned 4-byte integer type.

The additive hash, seen in commercial code:

```ub4 additive( char *key, size_t len, ub4 prime)
{
ub4    hash;
size_t i;
for (hash=(ub4)len, i=0; i<len; ++i) hash = hash+key[i];
return (hash % prime);
}
```
This takes 5n+3 instructions. There is no mixing step. The combining step handles one byte at a time. Bytes are treated commutatively, and every bit affects on average two output bits (one directly, and an average of one more due to carries). Bytes above the low-order byte of hash are affected only by carries. One of the instructions is mod, which can many cycles -- tens (on a MIPS) or hundreds (on a SPARC). It forces the use of prime table lengths, which makes it hard to dynamically increase the table size. Also, the prime can't be much bigger than one byte because the original hash value won't be much bigger than one byte. Knuth described this hash, except he required the key to be an array of words, not an array of bytes.

The rotating hash, also seen in commercial code:

```ub4 rotating( char *key, size_t len, ub4 prime)
{
ub4    hash;
size_t i;
for (hash=(ub4)len, i=0; i<len; ++i)
hash = (hash<<5)^(hash>>27)^key[i];
return (hash % prime);
}
```
This is the same as the additive hash, except it has a mixing step (a barrelshift by 5) and the combining step is exclusive-or instead of addition. Speed is 8n+3 instructions. Every input bit affects exactly one output bit (because of the exclusive-or instead of addition). Because of the mixing step, before the % prime, hash is equally likely to be any 32-bit value. This lets any size prime be used. Most input bytes are not treated commutatively. This hash was also described by Knuth , except he required the key to be an array of words, not an array of bytes.
```char pearson( char *key, size_t len, char tab[256])
{
char   hash;
size_t i;
for (i=0, hash=0; i<len; ++i)
hash=tab[hash^key[i]];
return (hash);
}
```
This preinitializes tab to an arbitrary permutation of 0 .. 255. It takes 6n+2 instructions. Bytes are not commutative. Every input bit affects every output bit, so there are no funnels. The mixing and combining steps are both permutations of the internal state. The result is even a power of two. The only problem with this hash is that the result is a single byte, which is too small for many applications. Larger results can be obtained by hashing twice but initializing hash to different values first, then concatenating the results. That would cause the two results to never be the same (causing a slight bias). It would also make this hash a little slower than most other options.
```ub4 crc( char *key, size_t len, ub4 mask, ub4 tab[256])
{
ub4    hash;
size_t i;
for (hash=(ub4)len, i=0; i<len; ++i)
hash = (hash<<8)^tab[(hash>>24)^key[i]];
}
```
This takes 9n+3 instructions. tab is initialized so that the low-order byte is a permutation. It produces a full 32-bit result. Bytes are not commutative. This would have no funnels if it stopped there, but it usually doesn't. This hash is usually used to simulate a maximal-length Linear Feedback Shift Register (LFSR) Golumb. An LFSR, rather than shifting a byte at a time, shifts a bit at a time, XORing some shift mask with the hash value depending on the bit being shifted out. The values of tab are the accumulated XORs of all possible sequences of 8 bits shifting out. Any input bit followed by the bits in the shift mask form a funnel, and any two funnels XORed together form another funnel. (This is an example of a linear mixing step, and how it spontaneously unmixes old inputs.) For the tests, we used a 32-bit state with a shift mask of 0x04c11db7.
```ub4 universal( char *key, size_t len, ub4 mask, ub4 tab[MAXBITS])
{
ub4    hash;
size_t i;
for (hash=0, i=0; i<(len<<3); i+=8)
{
register char k = key[i>>3];
if (k&0x01) hash ^= tab[i+0];
if (k&0x02) hash ^= tab[i+1];
if (k&0x04) hash ^= tab[i+2];
if (k&0x08) hash ^= tab[i+3];
if (k&0x10) hash ^= tab[i+4];
if (k&0x20) hash ^= tab[i+5];
if (k&0x40) hash ^= tab[i+6];
if (k&0x80) hash ^= tab[i+7];
}
}
```
This hash takes 52n+3 instructions to hash n keys. It produces a full 32-bit result. Bits are not commutative. It requires as many table entries as there are possible input bits, and they are chosen at random. Since every input bit can change only about half the output bits, there are funnels. Changing the table values will change where the funnels are, but it always has funnels. It's also slow, if written as above. But you could define tab2[MAXBYTES][256] like so:
```  for (i=0; i<MAXBYTES; ++i) {
for (j=0; j<256; ++j) {
int val;
for (k=0, val=0; k<8; ++k) {
if (j&(1<<k)) val ^= tab[8*i+k];
}
tab2[i][j] = val;
}
}
```
then implement the hash as:
```ub4 universal2( char *key, size_t len, ub4 mask, ub4 tab2[MAXBYTES][256])
{
ub4    hash;
size_t i;
for (hash=0, i=0; i<len; ++i)
hash ^= tab2[i][key[i]];
}
```
then it takes 10n+3 instructions.

The new hash presented in this article: This takes 6n+35 instructions. It produces a full 32-bit result with no funnels or strong 2-bit characteristics. The 64-bit hash produces a 64-bit result in 5n+41 instructions, also with no funnels or strong 2-bit characteristics.

MD4 MD4, a fast hash designed for cryptography: MD4 takes 420 instructions to hash a block of 64 bytes. Then there is the task of fitting bytes into registers, which required an additional 3n instructions per byte for our new hash. Using the method in the new hash for putting bytes into registers, MD4 should take 9.5n+230 instructions to hash n bytes into a full 64-byte result. Although MD4 has recently been shown to be too weak for cryptography Dobbertin, it is more than strong enough for table lookup. It has no funnels, and no noticable characteristics of any size.

The table statistics compares all these hash functions.

NAME
is the name of the hash.
SPEED
is the speed of the hash, measured in instructions required to produce a hash value for a table with about 1000 values.
FUNNEL-15
is the worst funnel when mapping 15-byte keys into a 1-byte result.
FUNNEL-100
is the worst funnel when mapping 100-byte keys into a 4-byte result.
COLLIDE-32
is the number of collisions found when a dictionary of 38470 English words ispell was hashed into a 4-byte value. (0.2 collisions are expected.)
SIZE-1000
is the smallest reasonable hash table size greater than 1000.
COLLIDE-1000
is a chi2 measure Knuth of how well the hash did at mapping the 38470-word dictionary into the SIZE-1000 table. (A chi2 measure greater than +3 is significantly worse than a random mapping; less than -3 is significantly better than a random mapping; in between is just random fluctuations.)
Comparison of several hash functions
NAME SPEED FUNNEL-15 FUNNEL-100 COLLIDE-32 SIZE-1000 COLLIDE-1000
Additive 5n+3 15 into 2 100 into 2 37006 1009 +806.02
Rotating 8n+3 4 into 1 25 into 1 17 1009 +1.24
Pearson 12n+5 none none 0 1024 +1.65
CRC 9n+3 15 into 14 15 into 14 1 1024 +0.16
Universal 10n+3 4 into 3 50 into 28 0 1024 +0.20
New 32-bit 6n+35 none none 0 1024 +0.33
New 64-bit 5n+41 none none 0 1024 +1.42
MD4 9.6n+230 none none 1 1024 +0.73

Funnels are a flaw that can be hit by some reasonable sets of keys, not by every reasonable set of keys. Only the additive and rotating hashes did poorly enough on the English dictionary for their test results to be noticably bad. This set of keys apparently did not exercise the funnels in the CRC or the universal hash very much.

Dobbertin
Cryptanalysis of MD4, 3rd Fast Software Encryption Workshop
Golomb
Shift Register Sequences
ispell.txt
a list of English words that comes with GNU EMACS
Knuth
The Art of Computer Programming, chapter 3
Knuth6
Knuth, D. The Art of Computer Programming, Volume 3: Sorting and Searching, Chapter 6.4. Addison Wesley, 1973
UNI
Carter, L. and Wegman, M. "Universal Hash Functions," J. of Computer and System Sciences 18, 143-154 (1979)