## Two new block ciphers

### Proposal

Warning, warning. I'm going to replace both these ciphers because I proved they are inferior to "a^=e; f+=(h>>k); h-=a;" for differential cryptanalysis when the delta is with "-".

Another warning, I'm giving up on designing block ciphers altogether because I'm not as dedicated as other people in the field at tracking down the flaws, and I can't improve significantly (performancewise) on what's already out there.

I am preparing to propose two new block ciphers, one for 256-bit blocks and 32-bit arithmetic, one for 512-bit blocks and 64-bit arithmetic. I have reference implementations for both of them. I have been analyzing the 256-bit cipher only so far because it is smaller, making searches more feasible.

I have two goals. The first is to teach myself enough about cryptography to be able to design and analyze block ciphers. The second goal is to provide faster block ciphers than are currently available without sacrificing security. My main reference point for what is currently available is the cipher Rijmen's SHARK which was presented at the last Fast Software Encryption conference.

The structure of both new ciphers is this:

1. F is a reversible mixing function.
2. FFFF achieves avalanche well.
3. The cipher G(k1,k2,block) is defined to be k1 XOR FFFF FFFF FFFF( k2 XOR block), where k1 and k2 are keys each the same size as the block. This is about 13 instructions per byte on a 32-bit machine or 6.5 instructions per byte on a 64-bit machine. There are references that give a proof that this key schedule is reasonable assuming FFFF FFFF FFFF has no flaws (quite an assumption).
4. If the key is small, say only 40 bits, H(key,block):=G(k1,k2,block) where k1:=G(0,0,key) and k2:=G(0,0,k1). If the key is zero, k1 and k2 will be zero, so that is a bad key. There are no other bad keys.
5. For the 256-bit block cipher, F is defined to be
```#define mix32(a,b,c,d,e,f,g,h) \
{ \
a-=e; f^=h>>8;  h+=a; \
b-=f; g^=a<<8;  a+=b; \
c-=g; h^=b>>11; b+=c; \
d-=h; a^=c<<3;  c+=d; \
e-=a; b^=d>>6;  d+=e; \
f-=b; c^=e<<4;  e+=f; \
g-=c; d^=f>>13; f+=g; \
h-=d; e^=g<<13; g+=h; \
}
```
6. For the 512-bit block cipher, F is defined to be
```#define mix64(a,b,c,d,e,f,g,h) \
{ \
a-=e; f^=h>>9;  h+=a; \
b-=f; g^=a<<9;  a+=b; \
c-=g; h^=b>>23; b+=c; \
d-=h; a^=c<<15; c+=d; \
e-=a; b^=d>>14; d+=e; \
f-=b; c^=e<<20; e+=f; \
g-=c; d^=f>>17; f+=g; \
h-=d; e^=g<<14; g+=h; \
}
```

### Design decisions

The new ciphers are not a theoretical design. They are the result of a series of brute-force computer searches for functions satisfying various criteria.

My first proposed mixing functions looked something like this:

```a = a+b+c+d;
mix(a);
b+=a; c+=a; d+=a;
```
These achieved avalanche as fast as possible; that is, for every input and output bit, inputs differing in only the input bit would differ in the output bit about half the time. However the mixing is rather superficial, for example, d-c before the mix is the same as d-c after the mix.

So the mixing should not be concentrated that way, it should be spread out throughout the internal state. A structure was found which could do this:

```a += b ^= (c<<k1);
b += c ^= (d>>k2);
c += d ^= (a<<k3);
d += a ^= (b>>k4);
```
Measurements showed this did quite well. But experimenting with the operations, shift values, and variables showed it was not optimal.

I attempted to do an exhaustive search of similar functions, with the goal of achieving avalanche as fast as possible. For simplicity and to mix the state uniformly, I chose a structure for a line and rotated the variables through that structure.

There are only a few fast software permutations. They are:

• a += f(b); for any function f
• a ^= f(b); for any function f
• a = tab[a]; where a is 8 (or 16) bits and tab is an array of 28 (or 216) terms which holds a permutation of 0..28-1 (or 0..216-1).
• a += (a<<k); for any constant k > 0
• a ^= (a<<k); for any constant k > 0
• a ^= (a>>k); for any constant k > 0
On 32-bit or 64-bit machines, avalanche can be reached faster with additions and shifts than with table lookup. SHARK is a table-lookup cipher; I have ignored table lookup. Multiplication and division are significantly slower than any of these operations.

There were several possible arrangements of lines: count through all the variables (abcdefgh), or treat them in pairs (ae bf cg dh), or triples (ace bdf), or quadruples (aceg bdfh). Counting through individually always did better, and counting through in any order is isomorphic to counting in increasing order. Functions did better when lines alternated <<,>>,<<,>> than in any other arrangement, especially when functions were applied several times.

Mixtures of "+" and "^" were always much better than pure "^" or pure "+". At first I avoided "-" because it increased the search space but was essentially the same thing as "+". "+" is good at changing many bits when almost all bits are 1. "-" is good at changing many bits when almost all bits are 0. Ciphers combining +,-,^ can do well at both extremes.

The best lines like

```a += b; b ^= (c<<k);
```
achieve avalanche after 6 iterations through all variables. The best lines like
```a += b; b += c; c ^= (d<<k);
```
achieve avalanche after 3 iterations through all variables. No reasonable lines could do avalanche after only 2 iterations through all the variables, so three operations per line seemed optimal. Very few 6-variable ciphers did avalanche in 3 iterations with such lines, more 7-variable ciphers did avalanche, and quite a few 8-variable ciphers did well. 8-variable ciphers won. They fit entirely in registers on reasonable machines. (A Pentium is not a reasonable machine.)

Many 32-bit machines and most 64-bit machines are superscalar, that is, they can issue multiple instructions simultaneously provided they do not depend on one another. Ciphers must encrypt data, and they must decrypt data. The following table lists all line structures for 8-variable ciphers which achieved avalanche in 4 iterations through all variables for some randomly chosen constants and operations.

Line structures and parallelism
Encrypt ParallelismEncrypt StructureDecrypt Structure Decrypt Parallelism
4 : 3a-=b; a^=(b>>k); c+=a;a-=c; c^=(b<<k);c+=b;4 : 3
8 : 3a-=b; a^=(b>>k); c+=h;a-=d; c^=(b<<k);c+=b;4 : 3
8 : 3a-=b; a^=(b>>k); d+=a;a-=d; c^=(b<<k);c+=b;4 : 3
2 : 1a-=b; b^=(c>>k); d+=a;a-=d; c^=(b<<k);d+=c;2 : 1
8 : 3a-=b; b^=(h>>k); e+=a;a-=d; d^=(f<<k);e+=d;2 : 1
4 : 3a-=c; c^=(b>>k); c+=b;a-=b; a^=(b<<k);c+=a;4 : 3
4 : 3a-=d; b^=(a>>k); a+=b;a-=h; h^=(a<<k);a+=f;2 : 1
4 : 3a-=d; d^=(b>>k); d+=b;a-=b; a^=(b<<k);c+=h;8 : 3
2 : 1a-=d; d^=(c>>k); d+=c;a-=b; b^=(c<<k);d+=a;2 : 1
8 : 3a-=e; f^=(h>>k); h+=a;a-=h; c^=(a<<k);h+=d;8 : 3
4 : 3a-=f; b^=(a>>k); a+=b;a-=h; h^=(a<<k);a+=d;2 : 1
16: 3a-=f; e^=(a>>k); a+=b;a-=h; e^=(a<<k);a+=d;2 : 1
4 : 3a-=f; g^=(a>>k); h+=a;a-=h; b^=(a<<k);h+=c;8 : 3
2 : 1a-=f; g^=(h>>k); g+=h;a-=h; a^=(h<<k);g+=b;2 : 1
8 : 3a-=f; g^=(h>>k); h+=a;a-=h; b^=(a<<k);h+=c;4 : 3
1 : 1a-=g; h^=(a>>k); h+=a;a-=h; a^=(h<<k);h+=b;2 : 1
2 : 1a-=h; a^=(h>>k); g+=b;a-=f; g^=(h<<k);g+=h;2 : 1
2 : 1a-=h; a^=(h>>k); h+=b;a-=g; h^=(a<<k);h+=a;1 : 1
2 : 1a-=h; a^=(h>>k); h+=c;a-=f; h^=(a<<k);h+=a;2 : 1
4 : 3a-=h; b^=(a>>k); a+=d;a-=f; h^=(a<<k);a+=d;8 : 3
4 : 3a-=h; b^=(a>>k); h+=c;a-=f; g^=(h<<k);h+=a;8 : 3
2 : 1a-=h; c^=(a>>k); a+=d;a-=f; g^=(a<<k);a+=d;4 : 3
2 : 1a-=h; h^=(a>>k); a+=d;a-=f; b^=(a<<k);a+=b;4 : 3
The structure "a-=e; f^=(h>>k); h+=a;" won because it was significantly better than "a-=f; e^=(a>>k); a+=b;", and was only slightly worse than the next couple candidates (ordered by parallelism). The best candidates ignoring parallelism had very low parallelism. The operation "a^=(a>>k);", although reversible, requires more instructions in reverse, so combining a variable with itself is slower than combining a variable with some other variable even ignoring parallelism. Structures that combined a variable with itself were not considered.

My avalanche program was written and used to find the shift constants for both the 256-bit and 512-bit ciphers. Exhaustive search was possible for the 256-bit cipher, but the 512-bit cipher just tried out shift constants at random. Functions were screened for the amount of avalanche after 1, 2, and 3 iterations through all variables. Functions were run both forward and backward, and before each assignment of each line. The final constants were chosen solely by these avalanche tests.

I currently plan on using 12 iterations of F. This achieves avalanche three times, which is the same as DES. Four rounds of F is roughly equivalent to 1 round of SHARK, and SHARK recommends 6 rounds, which implies I should be recommending 24 rounds of F (or SHARK should be recommending 3 rounds). I repeat, the new block cipher is pre-beta, so do not use it.

### Differential Cryptanalysis

Avalanche is the correct measure for noncryptographic hash functions. It tells you how well things do almost all the time. Cryptography, though, is most interested in the exponentially rare cases where things don't go well at all. The average performance is fairly irrelevant.

Differential cryptanalysis needs pairs of inputs differing by a delta which will reliably produce another delta after a few passes of F. Such a delta is called a characteristic. An iterated characteristic is a characteristic that is mapped to itself. No iterated characteristics have been found so far.

Analysis has been done to the 256-bit structure only so far. This structure has a few characteristics. Specifically, if the input delta is the all zero except for the high bits and the high bits are set as shown, the output delta will be all zeros except the high bits and the high bits will be set as shown, 100 percent of the time.

```  10010000 => 10011000
10001000 => 00011011
00011000 => 10000011
01100100 => 01100110
11110100 => 11111110
11101100 => 01111101
01111100 => 11100101
00100010 => 00000110
10110010 => 10011110
10101010 => 00011101
00111010 => 10000101
01000110 => 01100000
11010110 => 11111000
11001110 => 01111011
01011110 => 11100011
```

No delta appears on both sides. The predecessor of 10010000 and the successor of 10011000 each have a delta that occurs with probability 2-6; the one delta leads to the other with probability 2-11 after FFF. This is the strongest characteristic known.

```    a ^= 0x82000000;
b ^= 0x86000000;
c ^= 0x80000000;
e ^= 0x80000000;
f ^= 0x00821040;
h ^= 0x82104000;
```

It would be good to guarantee there are no iterated characteristics of probability 2-n or greater. I don't know how to do that.

For all n in 0..30, an exhaustive search of all deltas was made where the delta had high bits set and nth bits set. There are 31*256*256 such deltas. They were run forward 211 times and backward 211 times, and there were never collisions between sets of forward and backward deltas.

Can we say that a delta with weight n will have probability no more than 2-4n because of carries from 4 additions? Well, no. The best delta known (shown above) has 15 bits, 10 of which are not high bits, and it produces its output delta after one round with probability 2-5.