A small noncryptographic PRNG

This is a small fast pseudorandom number generator, suitable for large statistical calculations, but not of cryptographic quality. Although there is no guaranteed minimum cycle length, the average cycle length is expected to be about 2126 results, and no seed is expected to hit a cycle (or any other seed's sequence) in less than 264 results.

The fastest small unbiased noncryptographic PRNG that I could find (in C)

typedef unsigned long int  u4;
typedef struct ranctx { u4 a; u4 b; u4 c; u4 d; } ranctx;

#define rot(x,k) (((x)<<(k))|((x)>>(32-(k))))
u4 ranval( ranctx *x ) {
    u4 e = x->a - rot(x->b, 27);
    x->a = x->b ^ rot(x->c, 17);
    x->b = x->c + x->d;
    x->c = x->d + e;
    x->d = e + x->a;
    return x->d;
}

void raninit( ranctx *x, u4 seed ) {
    u4 i;
    x->a = 0xf1ea5eed, x->b = x->c = x->d = seed;
    for (i=0; i<20; ++i) {
        (void)ranval(x);
    }
}

I have not found any test in any configuration that it does not pass. It passes DIEHARD. Sampling just the bottom byte, it passes the run test (both up and down), and the gap test (for gaps up to length 32) for 2 trillion values (chi.c). The frequency test (again on the bottom byte) looked suspicious at 1 trillion values (chi-square=5), so I ran it for 4 trillion values (chi-square=0.42), showing the earlier result was a fluke (freq.c). A test that counts bits per value for five consecutive values (countx.c) passes for at least 16 trillion values, both for normal and for graycoded values. It passes Geronimo Jones' nda test for at least 1 trillion values (4 trillion bytes).

The strongest test for small-state generators I've run across is Bob Blorp2's bitcounting test (countx.c). Near as I can tell, what it is doing is indirectly measuring how much avalanche happens by the time entropy is first recycled. For this generator that's 5 results. I can measure that directly too: for every bit of the internal state, I can start with two copies of a random state that differ in just that bit, then run the generator for 5 results, and XOR their 5th result. Complete avalanche would give an average of 16 bits differing. The program rngav.c measures this: for the PRNG above, at least 8.8 bits are affected on average. I used the shifts (27, 17). Others that achieve 8.8 bits of avalanche include (9,16), (9,24), (10,16), (10,24), (11,16), (11,24), (25,8), (25,16), (26,8), (26,16), (26,17), and (27,16). Avalanche is much better in reverse, but the code is also slower, both because of more read-after-write dependencies. It is possible to not fully achieve avalanche yet still pass all practical tests for randomness because nearly-identical states are very rare among the set of all possible states.

I wrote this PRNG. I place it in the public domain. Same goes for at least the implementation of all those tests linked to above.

A third rotate would have improved avalanche

A third rotate could be introduced into ranval() without increasing the maximum parallel path through the code. This came out significantly slower with the Microsoft Visual Studio compiler, though. (MS VS uses pointer arithmetic for the additions, to trick Intel's compiler into using 3-variable instructions instead of 2-variable. The extra rotate thwarted that, thus the speed hit.) A third rotate could increase minimum avalanche to 13 bits, on average, like so:

u4 ranval( ranctx *x ) {
    u4 e = x->a - rot(x->b, 23);
    x->a = x->b ^ rot(x->c, 16);
    x->b = x->c + rot(x->d, 11);
    x->c = x->d + e;
    x->d = e + x->a;
    return x->d;
}

Shifts amounts achieving 13 bits avalanche include (3,14,24), (3,25,15), (4,15,24), (6,16,28), (7,16,27), (8,14,3), (11,16,23), (12,16,22), (12,17,23), (13,16,22), (15,25,3), (16,9,3), (17,9,3), (17,27,7), (19,7,3), (23,15,11), (23,16,11), (23,17,11), (24,3,16), (24,4,16), (25,14,3), (27,16,6), (27,16,7). I imagine these are significantly more pseudorandom than the PRNG at the top. However, since they run slower and no test I know of can detect nonrandomness in either, I'm recommending the 2-shift version.

I suspect that all you need for a statistically good pseudorandom number generator is

If that's all there is to it, designing statistically good pseudorandom number generators is pretty much a solved problem. You can tell by inspection if a generator has a big enough state and uses a reversible, and nonlinear function, where all the state affects all the state, there isn't some piece that isn't affected by some other piece. rngav.c can very quickly tell you how fast it mixes. How big an internal state is needed? 96 bits is cutting it too close, 128 bits seems good enough. How fast does mixing have to be? 4 out of 32 bits of avalanche before recycling entropy isn't good enough. Is 8.8 bits good enough? Probably. Though I could be wrong.

64-bit variants

If you want to use 8-byte terms instead of 4-byte terms, the proper rotate amounts are (39,11) for the two-rotate version (yielding at least 13.3 bits of avalanche after 5 rounds) or (7,13,37) for the three-rotate version (yielding 18.4 bits of avalanche after 5 rounds). I think I'd got with the three-rotate version, because the ideal is 32 bits of avalanche, and 13.3 isn't even half of that.

I don't think that there's an 8-byte rotate instruction on any 64-bit platform. And you only need 2 terms to get to 128 bits of internal state if you have 64-bit terms. Quite likely 64-bit deserves a whole different approach, not just different constants.

A possibly secure extention of this PRNG

The small-state version isn't suitable for encryption. However, if x->b is replaced with an array and e is used for lookup in the array, it becomes something that is possibly of cryptographic strength:

typedef unsigned long int  u4;
typedef struct ranctx { u4 a; u4 b[256]; u4 c; u4 d;} ranctx;

#define rot(x,k) (((x)<<(k))|((x)>>(32-(k))))
void ranxor( ranctx *x, u4 m[256] ) {
    u4 i;
    for (i=0; i<256; ++i) {
        u4 e = x->a - rot(x->b[i], 27);
        x->a = x->b[e%256] ^ rot(x->c, 17);
        x->b[e%256] = x->c + x->d;
        x->c = x->d + e;
        x->d = e + x->a;
        m[i] ^= x->d;
    }
}

void raninit( ranctx *x, u4 seed[128] ) {
    u4 i;
    u4 junk[256];
    x->a = x->c = x->d = 0xf1ea5eed;
    for (i=0; i<128; ++i) {
        x->b[i] = x->b[i+128] = seed[i];
    }
    for (i=0; i<10; ++i) {
        ranxor(x, junk);
    }
}

The routine ranxor() takes a message m[] consisting of 256 4-byte values and XORs 256 pseudorandom 4-byte values with it.

I strongly recommend against using this for any cryptographic purpose. If it is secure, it is just barely so. I don't think it's even particularly fast. The only value I see in it is as a target for cryptanalysis and possible inspiration for future stream ciphers.

Although lack of bias comes from the same good mixing as the original generator, the security comes more from the pseudorandom lookups than from the mixing. The most blatant weakness is that there is one pseudorandom lookup per result returned, leaving a system with n+256 variables and n equations after seeing n results (counting pseudorandom lookups as separate variables than lookups from known positions). If you can make progress guessing or linking the variables the whole thing should be solvable. Another weakness (and the most curious part of this generator) is that the positions to be updated in x->b[] are selected pseudorandomly rather than in a regular order. That means each pass leaves about 1/e of the positions in x->b[] unchanged. Which means you can reduce the number of independent variables just by guessing which positions don't get changed between rounds.

I still don't know how to break it, but I haven't tried very hard.


Hash functions for 32-bit integers
Some paper airplane designs
Lexicodes: sets of values all n bits apart
Table of Contents