Lexicodes are built greedily, by iterating through the integers and accepting any integer that is the specified distance from all other accepted integers.

### Tables of Lexicodes

An (n,k,d) lexicode can encode up to k data bits in n bits total by including n-k additional check bits. It can detect up to d-1 wrong bits and correct up to (d-1)/2 wrong bits. The values for a lexicode of dimension d all differ by at least d bits. Below are the n-k check bits for the codewords for the powers of two for d=3..33 for small values of k.

• Hamming distance 3, to infinity and beyond!
• Hamming distance 5, up to (2149,2122,5)
• Hamming distance 7, up to (305,275,7) (includes the Golay code)
• Hamming distance 9, up to (108,78,9)
• Hamming distance 11, up to (69,38,11)
• Hamming distance 13, up to (60,26,13)
• Hamming distance 15, up to (62,25,15)
• Hamming distance 17, up to (52,14,17)
• Hamming distance 19, up to (54,14,19)
• Hamming distance 21, up to (47,8,21)
• Hamming distance 23, up to (41,3,23)
• Hamming distance 25, up to (52,6,25)
• Hamming distance 27, up to (48,3,27)
• Hamming distance 29, up to (56,4,29)
• Hamming distance 31, up to (31,1,31)
• Hamming distance 33, up to (69,6,33)

n is the number of bits, k is the dimension (k bits, or 2k values), and d is Hamming distance. The tables list (n,k,d), the hex value for the n-k check bits for the codewords for the powers of two, and the binary for that same value, since the patterns are easier to see in the binary. The codewords for the powers of two form the basis of a lexicode. You have to tack an identity matrix on the left (representing all the data bits) to get a full basis, and to get the basis of the official lexicodes you have to scatter the check bits among the data bits. (Least significant bits are on the right. The data bit for the row is always the rightmost position to the left of all other used bits. The first row using a given check bit places the column for that check bit immediately to the left of the data bit for the previous row. The check bits themselves are already correctly ordered in the tables.)

Here is the C code I used to find these codes. Here's another that takes already-generated codes as arguments, so I can restart whenever Windows dies on me.

(A more thorough overview of lexicodes than this is here.)

Two equal-length bit arrays have a Hamming distance of d if they differ in d bits, or equivalently, if the XOR of the arrays contains d 1s. Two positive integers have a Hamming distance of d if their binary representations have a Hamming distance of d. Let H(a,b) be the Hamming distance between a and b.

A set of values has a minimum Hamming distance of d if the minimum Hamming distance between any two distinct values in the set is d. So the question we want to answer is, find a set of 2k n-bit numbers with minimum Hamming distance d.

The Hamming code constructs sets of values with minimum Hamming distance d=3.

Notice that the Hamming code for 15 is the XOR of the Hamming codes for 1, 2, 4, 8, and that 1+2+4+8=15.

```        3 2 1 c 0 b a
1:  0 0 0 0 1 1 1
2:  0 0 1 1 0 0 1
4:  0 1 0 1 0 1 0
^ 8:  1 0 0 1 0 1 1
-------------------
15:  1 1 1 1 1 1 1
```
The Hamming codes for powers of 2 are a basis for the whole set of Hamming codes. (That means the Hamming codes for all other numbers can be derived by XORing together the codes for the powers of two.) (You can stack the codewords for the powers of two on top of one another to get a generating matrix for Hamming codes. Multiply it by the binary representation of some data and you get the matching codeword as a result.)

Method: It is easy to find sets of numbers with a Hamming distance of d. Start with the set {0}, then count upward, adding any number to the set that has Hamming distance at least d from every element already in the set.

This method will automatically produce a set that is closed under XOR, that is a linear set, whose basis is the representations of powers of two. These are known as the lexicodes, a set of binary linear codes that keep values in lexicographic order. The Hamming codes are lexicodes.

This algorithm also automatically separates check bits from data bits; every bit that is the high bit of the representation of any power of two is a data bit and will be zero in the representations of all other powers of two.

Note that if you rearrange the bit positions for all values in a set with Hamming distance d, you'll get another set of values with Hamming distance d. If you put the check bits all on the left you'll still preserve lexicographic order.

Those assertions simplify the search: rather than searching for the full representation of every integer, you can search for just the check bits for the representations of powers of two.

Will the representations of powers of 2 always have only d bits set? No. If we limited the search to check bits with d-1 bits set, would we get the same set of codes? No, and sometimes they wouldn't be as compact. Is there a way to get sets that are as compact yet only have d bits set? Yes, but some powers of two would set data bits owned by previous powers of two. Are lexicodes the smallest possible codes? No, usually not. The smallest possible linear codes? No, usually not.

One more thing: given the basis for a set with Hamming distance d (where d is odd), you can always extend that to a basis for for a set with even Hamming distance d+1 by adding one extra check bit to every value which is the XOR of all other bits in that value. I don't know why. I don't know how to extend this to d+2.

These codes can be produced by grouping the data bits into bytes (that is, sets of 8 bits), then placing the check bits for all 256 combinations of those 8 bits in a 256-term array (one array per byte). The check bits for all data bytes can be XORed together to produce the final set of check bits.

As with Hamming codes, you can substitute bytes or words for bits. If the original code took up n bits, the new code takes up n words, and for all bit positions i, the ith bit position of those n words form an instance of the original n-bit code. Such codes do well with burst errors (since the bits of each code are spread over many words), and they can be computed quickly in software. They could be computed quicker still if you use representations of powers of 2 that have only d bits set (which is always the case for some linear combination of these codes).

A set with Hamming distance d can be used as an error correction code that can correct (d-1)/2 bit errors. I don't know of an efficient way to do that. An inefficient way is to map a value to the value in the set the smallest Hamming distance away from it. (That requires comparing to every value in the set.)

For sets with Hamming distance over 3, there do exist values that have Hamming distance greater than (d-1)/2 from every element in the set. Those values have more errors than the number of errors that can be corrected. For high d, most values fit that description. This is equivalent to a close-packing of balls in high dimension covering only a tiny fraction of the total space.