4-byte Integer Hashing

The hashes on this page (with the possible exception of HashMap.java's) are all public domain. So are the ones on Thomas Wang's page. Thomas recommends citing the author and page when using them.

Thomas Wang has an integer hash using multiplication that's faster than any of mine on my Core 2 duo using gcc -O3, and it passes my favorite sanity tests well. I've had reports it doesn't do well with integer sequences with a multiple of 34.

uint32_t hash( uint32_t a)
    a = (a ^ 61) ^ (a >> 16);
    a = a + (a << 3);
    a = a ^ (a >> 4);
    a = a * 0x27d4eb2d;
    a = a ^ (a >> 15);
    return a;
}
Thomas has 64-bit integer hashes too. I don't have any of those yet.

Here's a way to do it in 6 shifts:

uint32_t hash( uint32_t a)
{
    a = (a+0x7ed55d16) + (a<<12);
    a = (a^0xc761c23c) ^ (a>>19);
    a = (a+0x165667b1) + (a<<5);
    a = (a+0xd3a2646c) ^ (a<<9);
    a = (a+0xfd7046c5) + (a<<3);
    a = (a^0xb55a4f09) ^ (a>>16);
    return a;
}

Or 7 shifts, if you don't like adding those big magic constants:

uint32_t hash( uint32_t a)
{
    a -= (a<<6);
    a ^= (a>>17);
    a -= (a<<9);
    a ^= (a<<4);
    a -= (a<<3);
    a ^= (a<<10);
    a ^= (a>>15);
    return a;
}

Thomas Wang has a function that does it in 6 shifts (provided you use the low bits, hash & (SIZE-1), rather than the high bits if you can't use the whole value):

uint32_t hashint( uint32_t a)
{
    a += ~(a<<15);
    a ^=  (a>>10);
    a +=  (a<<3);
    a ^=  (a>>6);
    a += ~(a<<11);
    a ^=  (a>>16);
}

Here's a 5-shift one where you have to use the high bits, hash >> (32-logSize), because the low bits are hardly mixed at all:

uint32_t hashint( uint32_t a)
{
    a = (a+0x479ab41d) + (a<<8);
    a = (a^0xe4aa10ce) ^ (a>>5);
    a = (a+0x9942f0a6) - (a<<14);
    a = (a^0x5aedd67d) ^ (a>>3);
    a = (a+0x17bea992) + (a<<7);
    return a;
}

Here's one that takes 4 shifts. You need to use the bottom bits, and you need to use at least the bottom 11 bits. It doesn't achieve avalanche at the high or the low end. It does pass my integer sequences tests, and all settings of any set of 4 bits usually maps to 16 distinct values in bottom 11 bits.

uint32_t hashint( uint32_t a)
{
    a = (a^0xdeadbeef) + (a<<4);
    a = a ^ (a>>10);
    a = a + (a<<7);
    a = a ^ (a>>13);
    return a;
}

And this one isn't too bad, provided you promise to use at least the 17 lowest bits. Passes the integer sequence and 4-bit tests.

uint32_t hashint( uint32_t a)
{
    a = a ^ (a>>4);
    a = (a^0xdeadbeef) + (a<<5);
    a = a ^ (a>>11);
    return a;
}

More Wordy Stuff

Adam Zell points out that this hash is used by the HashMap.java:

private static int newHash(int h) {
    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}
Although this hash leaves something to be desired (h & 2047 uses only 1/8 of the buckets for sequences incremented by 8), it does bring up an interesting point: full avalanche is stronger than what you really need for a hash function. All you really need is that the entropy in your keys be represented in the bits of the hash value that you use. Often you can show this by matching every differing input bit to a distinct bit that it changes in the portion of the hash value that you use.

One very non-avalanchy example of this is CRC hashing: every input bit affects only some output bits, the ones it affects it changes 100% of the time, and every input bit affects a different set of output bits. If the input bits that differ can be matched to distinct bits that you use in the hash value, you're golden. Otherwise you're not.

4-byte integer hash, half avalanche

Full avalanche says that differences in any input bit can cause differences in any output bit. A weaker property is also good enough for integer hashes if you always use the high bits of a hash value: every input bit affects its own position and every higher position. I'll call this half avalanche. (Multiplication is like this, in that every bit affects only itself and higher bits. But multiplication can't cause every bit to affect EVERY higher bit, especially if you measure "affect" by both - and ^.) Half-avalanche is sufficient: if you use the high n bits and hash 2n keys that cover all possible values of n input bits, all those bit positions will affect all n high bits, so you can reach up to 2n distinct hash values. It's also sometimes necessary: if you use the high n+1 bits, and the high n input bits only affect their position and greater, and you take the 2n+1 keys differing in the high n bits plus one other bit, then the only way to get over 2n hash values is if that one other input bit affects position n+1 from the top. For all n less than itself. So it has to affect itself and all higher bits.

Actually, that wasn't quite right. Half-avalanche says that an input bit will change its output bit (and all higher output bits) half the time. But if the later output bits are all dedicates to representing other input bits, you want this output bit to be affected 100% of the time by this input bit, not 50% of the time. This doesn't entirely kill the idea though. If every bit affects itself and all higher bits, plus a couple lower bits, and you use just the high-order bits, then the lowest high-order bit you use still contains entropy from several differing input bits. So it might work. Hum. Better check how this does in practice!

Similarly for low-order bits, it would be enough for every input bit to affect only its own position and all lower bits in the output (plus the next few higher ones). Half-avalanche is easier to achieve for high-order bits than low-order bits because a*=k (for odd k), a+=(a<<k), a-=(a<<k), a^=(a<<k) are all permutations that affect higher bits, but only a^=(a>>k) is a permutation that affects lower bits. (There's also table lookup, but unless you get a lot of parallelism that's going to be slower than shifts.)

Here's a 5-shift function that does half-avalanche in the high bits:

uint32_t half_avalanche( uint32_t a)
{
    a = (a+0x479ab41d) + (a<<8);
    a = (a^0xe4aa10ce) ^ (a>>5);
    a = (a+0x9942f0a6) - (a<<14);
    a = (a^0x5aedd67d) ^ (a>>3);
    a = (a+0x17bea992) + (a<<7);
    return a;
}

Every input bit affects itself and all higher output bits, plus a few lower output bits. I hashed sequences of n consecutive integers into an n-bucket hash table, for n being the powers of 2 21 .. 220, starting at 0, incremented by odd numbers 1..15, and it did OK for all of them. Also, for "differ" defined by +, -, ^, or ^~, for nearly-zero or random bases, inputs that differ in any bit or pair of input bits will change each equal or higher output bit position between 1/4 and 3/4 of the time. Here's a table of how the ith input bit (rows) affects the jth output bit (columns) in that hash (single bit differences, differ defined as ^, with a random base):

5146485155524551535050505050495050515050504950505150565044655044
5132465254555551455150535150504650505351505048505052505650446550
5247385050554849514750505151505047505053505050485050525056504465
8560433351485751515045515353505050455051555050504750505350565044
1593584532504858505150504950505050505050515050505047505053505650
5461625351394950505050505050505050505050505050505050495050515056
5168405254403850515150505050505050505050505050505050504850505250
5132815355424850505350504950504950505150505050505050505047505050
10050446455544554505053505050505050505050505050505050505050465050
0100504760555149635149505050505050495050505050505050505050504250
0 01005048645049516150505250505050505050505050505050505050504742
0 0 010051436349585060505053505050505050505050505050505050505048
0 0 0 0100504860505449495050505050505050505050505050505050505050
0 0 0 0 01005045705054515650505250505050505050505050505050505050
0 0 0 0 0 010050377450595056505052505050505050505050505050515050
0 0 0 0 0 0 0100503872506050565050525050505050505050505050505050
0 0 0 0 0 0 0 01005040675061505650505250505050505050505050505050
0 0 0 0 0 0 0 0 010050506250515055505051505050505050505050505051
0 0 0 0 0 0 0 0 0 0100504367505750565050525050505050505051505350
0 0 0 0 0 0 0 0 0 0 01005044665056505650505250505050505050505052
0 0 0 0 0 0 0 0 0 0 0 010050446650575056505052505050505050505050
0 0 0 0 0 0 0 0 0 0 0 0 0100504466505750565050515050505050505050
0 0 0 0 0 0 0 0 0 0 0 0 0 01005044665057505650505250505050455050
0 0 0 0 0 0 0 0 0 0 0 0 0 0 010050446650575056505053505050504050
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0100504466505750565050545055504466
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 01005044665057505650505450565044
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 010050446650575056505054505550
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0100504466505750605050555055
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 01005044665057505250515350
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 010050446650575061505051
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0100504470505750625050
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 01005038755062507350

If you use high-order bits for hash values, adding a bit to the hash value to double the size of the hash table will add a low-order bit, so old bucket 0 maps to the new 0,1, old bucket 1 maps to the new 2,3, and so forth. They overlap. It's not as nice as the low-order bits, where the new buckets are all beyond the end of the old table. Also, using the n high-order bits is done by (a>>(32-n)), instead of (a&((1<<n)-1)), note that >> takes 2 cycles while & takes only 1. But, on the plus side, if you use high-order bits for buckets and order keys inside a bucket by the full hash value, and you split the bucket, all the keys in the low bucket precede all the keys in the high bucket (Shalev '03, split-ordered lists). Incrementally splitting the table is still feasible if you split high buckets before low buckets; that way old buckets will be empty by the time new buckets take their place.

4-byte integer hash, full avalanche

I was able to do it in 6 shifts.

uint32_t hash( uint32_t a)
{
    a = (a+0x7ed55d16) + (a<<12);
    a = (a^0xc761c23c) ^ (a>>19);
    a = (a+0x165667b1) + (a<<5);
    a = (a+0xd3a2646c) ^ (a<<9);
    a = (a+0xfd7046c5) + (a<<3);
    a = (a^0xb55a4f09) ^ (a>>16);
    return a;
}
These magic constants also worked: 0x7fb9b1ee, 0xab35dd63, 0x41ed960d, 0xc7d0125e, 0x071f9f8f, 0x55ab55b9 .

For one or two bit diffs, for "diff" defined as subtraction or xor, for random or nearly-zero bases, every output bit changes with probability between 1/4 and 3/4. I also hashed integer sequences incremented by odd 1..31 times powers of two; low bits did marvelously, high bits did sorta OK. Here's the table for one-bit diffs on random bases with "diff" defined as XOR:

5047505150505049425050504950505050475049585051466250555045635145
5050505050505050504850505050505050505050505350514759505250486351
5450495050514950504945505050475054505146504858495043645159494370
5052505050505149505050475050504950525051485049545051466351585142
4250505050505050495050504750505041505050505050505550504860505250
4843505150505050514950505046505046445051505046504856515244675060
4948435051505050505149505050475049494250525050455048604951446551
5050505350495050505050505050505050505047505450514550475949514664
5050505051505050505050505050505050505050495050505049505053505047
5250505050525050505050505050505052505050504850525050465049555050
5051505050505250495050505050505050515050505048505050504950505350
4750505050505052504950505050505047505050505050485051505049505056
5047505150505050525049505050505050475051505050504850535050475049
5150495050505050505250505050505051504950505050505048515150504750
5051514650525051505052504950505050515146505250495050485150515149
5050505146505250505050515050505050505051465052505049504850534951
5450505050495050505050505150505055505050504950505050505048505050
5054505050504650525050505051505050545050505045505350505050475050
5250535050505046505150505050515053505350505050455052505049504850
5048494950505050545050505050505050455051505049514350545049495045
5055435046505247504750505350505150456551555048635052504763505046
5750565150465052485047505054505057504466515650426450595042654950
5160505651504650524750485050535051605044665155504762505250476250
6050515055505146505250504850505160505150456451555048635052504863
4563515550565150455052505047505045635155504465515550426951594942
4945635056505550514650525050475049456350565045655155504263515951
4949456350565056505147505250504849494563505650446551555048635052
6549504564505650565050475051505065495045645056504465515550426951
4565495045645156505550504750514945654950456451565045655155504264
5044664950456351565056505047505150446649504563515650446551555048
5549446850504565505550565050475055494468505045655055504467515550
5160503973495243705160506051504551605039734952437051605040725160

If you don't like big magic constants, here's another hash with 7 shifts:

uint32_t hashint( uint32_t a)
{
    a -= (a<<6);
    a ^= (a>>17);
    a -= (a<<9);
    a ^= (a<<4);
    a -= (a<<3);
    a ^= (a<<10);
    a ^= (a>>15);
    return a;
}
I've confirmed this does well with sequences incremented by common amounts whether you use the high or low bits of the hash. And it does avalanche if one or two input bits differ, for a variety of base input values, with "differ" defined as + ^ - or ~^. For one-bit input differences on top of a random base value with "differ" defined as ^, each input bit changes each output bit with probability between .39 and .73 (for random bases with diff defined as XOR). Specifically, this is the percentage of the times the ith input bit (rows) changed the jth output bit (columns):
5050505050505050505150505049505051505050505045505151495055515245
5050505050505050495051505050495050505050505050465051504950555152
4950505050495050504950515050505050505050515150504550515149505551
5549505049505050505049505150515050505051505150505047505049485055
5154495050505050505050505051505050495050515050505050475050504850
5051544950505050505050505050515050544950505150515050504750505048
4450505150505050505050505050505050505150505051505050505048505050
4944505051505050505050505050505050505051505050505050505050485050
5049445050525050505050505050505050445050525051505150505050504850
5450504750505249505050505050505050504750505250505051505050505048
5152505048505051505050505050505050505049505051505050505050515050
5050505050505050505050505050505050505050505050505050505050505050
5050505050505050505050505050505050515050505050505050505050505050
5050505050505050505050505050505050505050505050505050505050505050
5050505050505050505050505050505050505051505050505050505050505050
5050505050505050505050505050505050505150515050505050505050505050
5050505050505050505050505050505050505050505150505050505050505050
5050505050505050505050505050505050505050505050505050505050505050
5050505050505050505050505050505050505050505050505050505050505050
5050505050505050505050505050505050505050505050505050505050505050
5249505050505050505050505050505050505050505050505050505050505050
5052505050505050505050505050505050505050505050505050505050505050
5050524950505050505050505050505050524950505050505050505050505050
4450505149505049505050505050505050505149505051505050505050505050
4943505051495050505050505050505050505051495050505050505050505050
5049415050525050505050505050505050415050525050505150505050505050
4466485848505349505050505050495051485848505349505053495052495049
5245654859485053495049495050504950654859485053495051534950514850
5052456648594850535050504950505049456648594850535050505349505249
6550524466485947505349504950505050524466485947505349505153495051
4568505243684859485153505049505050505243684859485153505051534950
4939735153417347654550554950494950735153417347654550554950515448

The following operations and shifts cause inputs that differ in 1 or 2 bits to differ with probability between 1/4 and 3/4 in each output bit. (k=1..31 is += <<k, k=33..63 is -= <<(k-32), 65..95 is ^= <<(k-64), and 97..127 is ^= >>(k-96).) I put a * by the line that represents the hash above.

 38 113  41  68  35  74 111 *
 38 113  42  69  35  73 112
 38 114   9 100  35 107  46
 38 114  11  66   8  68 112
 38 114  42  69  35  73 112
 38 114  78  37  71  35 111
 39 113  41  68   2  74 112
 39 114   9 101   2 107  17
 39 114   9 101   2 107  49
 39 114  37  99  39 109  50
 39 115  36  67  38  44 112
 39 115  37  70  35 110  11
 39 115  41  74  36  67 111
 39 116   4 104   6 107  16
 39 116  10 101   8  75 113
 40 113  12  99  39  69 112
 40 113  13  99   6  69 113
 40 113  38 101   2 106  16
 40 113  38 101   2 106  48
 40 114   3 102   8 109  15
 40 114  37  99   7  77 113
 41 113  11 100   7  69 111
 42 114  44  99  38  72 113
 43 115   7 101   3 109  48
 44 114  36 105  38 108  16
 44 114  37 102  35 107  16
 44 114  41 101   2 109  16
 45 113  37 102   3 108  47
 45 113  37 105  35 104  17
 45 113  37 105  35 104  47
 45 113  39  99  37  76 111
 45 113  42 101   2 109  46
 45 113  42 101   2 109  50
 46 113  42 101  35 110  47
 46 113  42 101  35 110  50

Another hash is Thomas Wang's function,

uint32_t hashint( uint32_t a)
{
    a += ~(a<<15);
    a ^=  (a>>10);
    a +=  (a<<3);
    a ^=  (a>>6);
    a += ~(a<<11);
    a ^=  (a>>16);
}
I got the idea of adding a constant from looking at what affect his ~ had. His function passed only half of my tests for full avalanche. If you use it, the low bits are mixed better than the high bits. The low bits do quite well on sequences incremented by a constant amount. For random bases and diff defined as XOR, it did kinda OK, every input bit affects every output bit with probability between .36 and .76, specifically:
5049505050505054494951505050505050505350505045635145655043675045
5050504850505050554950505050505050505054505050456351456550436750
5050505047505050495649505050505049505050554950504563504565504367
5050505050485050504956505050505050494950505550505045635145655043
5050505050504850505049535050505050505049505055505050456351456550
5050505050505048505050505350505050505050495050554950504564504564
5050505050505050475050505051505050505050504950505450505045655046
5050505051505050504750505050515150505050495049505055495050436752
3950505050505050505047505050505239505050505050495050554950504368
4739505050505050505050475050505047395050505050504950505547505042
5052515050505051505050505050505050484050505050495049505057475050
4950504650505050505051505050505050504840505050505050495050574750
5049504943505050505050505050505049505048415050505050504948505847
5150505050465050505051505050505050505050484150505050525049485058
5150504950504550505051475050505050504950504840505050505750495150
5050505049505047505050504750505050505050505049435050505055505049
5050505050495050465050505048505050505050505050484150505050564850
5147505046505050505649495050525051535050545050504563504565504367
4951475050465050504957494950505249515350505450505045635145655043
5449504750504550515049564949505054495053505054505050456351456550
5054495047505046505050495649505050544950535050544950504564504564
5050544951475050455050505053505050505449515350505550495044655046
5050505449504750504550505150534950505054495053505055495050436752
6750505055495047505045505050505367505050554950535050544950504368
4367505050554950475050455050505043675050505549505350505548505042
5043675050505449504750504350505050436750505054495053505056475050
6550436750505055495047505042505065504367505050554950535050574750
4465504367505050544950465050445044655043675050505449505450505747
5045655043675050505551504550504450456550436750505055515055505057
6850446450436751505046655044505068504464504367515050466550565050
4372504769504371505051447050475043725047695043715050514470505350
5036765339745037754750503776533950367653397450377547505037765361