(I feel I'm reinventing the wheel here -- tell me if I am! Also, where can I put the C code for this? About 400 lines.) This is a way to avoid computing probabilities by hand before running tests on RNGs. It lets me make up tests where I don't know how to find the probabilities. I can write new tests by putting bugs in old ones! The method is simple: Let X,Y be counts of independent events gathered from RNGs such that X and Y should have the same probability distribution p_i, and s=sum(x_i)=sum(y_i), and all x_i > c for some small constant c (0..16 are good). Let X be gathered from a control, and Y from the thing we want to test. standard chi^2: sum((y_i-(sp_i))^2/(2sp_i)) is about n-1. new method: sum((y_i- x_i )^2/(2 x_i)) is about n-1. Justification: Thm 1: sum( (y_i-x_i)^2 /(2sp_i)) has the same distribution as sum(((y_i+x_i)-2sp_i)^2/(2sp_i)). Thm 2: sum( (y_i-x_i)^2/(2 x_i)) has a distribution about the same as sum( (y_i-x_i)^2/(2sp_i)) + sum(1/x_i) Hence the new method produces results very close to the standard chi^2 distribution for X+Y. Proof of thm 1: Note that sum(x_i)=sum(y_i)=s=sum(sp_i). sum((y_i-x_i)^2) = sum((y_i+x_i)^2 - 4x_iy_i) = sum((y_i+x_i)^2 - 4(sp_i)^2 - 4x_iy_i + 4(sp_i)^2) = sum((y_i+x_i)^2 - 8(sp_i)^2 + 4(sp_i)^2 - 4x_iy_i + 4(sp_i)^2) = sum((y_i+x_i)^2 - 4sp_i(y_i+x_i) + 4(sp_i)^2 - 4x_iy_i + 4(sp_i)^2) = sum(((y_i+x_i)-2sp_i)^2 - 4x_iy_i + 4(sp_i)^2) = sum(((y_i+x_i)-2sp_i)^2) because X and Y are independent. Handwave at thm 2: Let z,m,d,e be given such that z/(m+d) = z/m + e. Then z+me = zm/(m+d), and e = (zm - zm - zd)/m(m+d) = -zd/m(m+d). Our problem has z=(y_i-x_i)^2, m+d=2x_i, and m=2sp_i. For each m, d has a normal distribution with variance m. sum((z/(m+d)) = sum(z/m) + sum(-zd/m(m+d)). Set f=abs(d), we then have sum(-zf/m(m+f)) + sum(zf/m(m-f)) == sum(-zf/m(m+f) + zf/m(m-f)) (both + and - occur equally often) == sum((-zf(m-f) + zf(m+f))/m(m^2-f^2)) = sum(2zd^2/m(m^2-d^2)) (since d^2 = f^2) == sum(2zd^2/m^3) (== means "is approximately") == sum(2d^2/m^2) (since sum(z/m)==n-1 and z/m==1) == sum(2/m) (since the variance of d is m) == sum(1/x_i) (since 2x_i is about m) Choose c such that x_i>c implies sum(1/x_i) is small compared to n-1. All of this begs the question, what can be used as the control RNG. Most RNGs can be scaled to have bigger or smaller internal states. Big is usually much better than small. So find an RNG that does well when small, scale it up to have a huge internal state, then only test small RNGs against it. - Bob Jenkins