A **delta** is the difference between two inputs, difference
either from subtraction or exclusive-or. A hash causes a delta to
achieve **avalanche** if every output bit differs with probability
about 1/2 when two inputs differ by that delta. A function is said to
achieve **avalanche** (no delta specified) if every one-bit input
delta achieves avalanche.

Functions that do not achieve avalanche make bad hashes and rotten
ciphers. The paper Evaluating Hash
Functions examines these issues. One of the tasks in designing
fast hashes and ciphers is finding functions that achieve avalanche
quickly. Most such functions have "magic constants" -- constants that
seem to be chosen at random. Often these constants *are* chosen
at random, or from the first few digits of pi, or such. Choosing
constants this way means there is no reason to believe they will be
better or worse than any other constants.

This code allows magic constants to be chosen to pass the avalanche test. This guarantees that they will be exceptionally good for something. They may be exceptionally bad at something else, who knows. That is a danger of having a purpose.

This is the code:

standard.h | maketest.txt |

rand.h | rand.c |

recycle.h | recycle.c |

test.h | testb.c |

test.c |

The current code is checking that, for every 2-bit input delta, the function (myff, reverse in mybb, similar to three calls to myf) causes at least 32 bits of the internal state to change with probability 1/4 (for random input pairs are tried).

A typical search regime:

- Use tgone() and ttsan() to make sure that forward(backward(x))==x. You want to fix bugs in your functions before running any tests. Have myf() and myb() describe the function.
- To find constants for NUMTEST and MYLIMIT, compose the function with itself ten times so you know it achieves avalanche, and see how the numbers are distributed. Choose limits that won't eliminate many reasonable functions. Use tgran() to generate the parameters, and use tfsho() for debugging if everything is failing.
- Once your code is debugged, use exhaustive or random search, tbone() building all input/output pairs, ttlim() checking that every input bit affects every output bit, and tpcou() making input pairs that are very similar. You really only pay for hits, so try to have the hits be 1 out of 100 or even scarcer. Pipe the outputs (successes only) to a file. Let it run overnight or for a couple days.
- Use tgfil() to read in the successes from the previous round. Change to ttlea() with NUMTEST=100 to whittle down the number of candidates. Maybe use tpran() to generate the pairs, so you don't get a funny bias due to the way tpcou() clusters pairs. Increase NUMTEST to be more picky, but increasing it slows things down. Repeat.

I consider a hash to pass if it passes 24 configurations: all combinations of (tbtop, tbbot, tbone, tbtwo) x (tpran, tpcou) x (ttlim (ppp=1), ttcou (ppp=2), ttlea (ppp=1)). (See theorem Nofunnel to see where these tests come from.) Then I plug the hash into the self-test in lookup.c -- sometimes it still fails. I haven't identified what that self-test tests that these avalanche tests don't.

Other useful programs: sort.c and distinct.c, they can handle files of arbitrary length. Distinct only works correctly on sorted files.

Internal links:

Finding Characteristics in Block
Ciphers

Evaluating Hash Functions

Calculating the HOMFLY knot polynomial

Table of Contents