Immortal Memory (April 2015)

Memory in Immortals

We'd like to live forever. However, our heads are only so big, and they can only fit so many memories. It's a good bet that by the age of 50 we're already forgetting things at least as fast as we learn them.

If we live forever but only have finite memory, how much can we remember? Are we doomed to forget everything? Suppose we could choose the strategy for what to remember, rather than just having the past evaporate on us outside of our control. Here's a way to do it.

Cut in Two

The strategy here is to have phases, with a fixed storage for each phase. When a phase overflows, take 2 parts, throw away half, and save the 1 remaining part to the next phase. If that causes the next phase to overflow, recurse. You can have an infinite number of phases if the next phase is always smaller so the sum of them all is finite.

However practically there are limits to how many phases there are. For example if you are storing data eventually you get down to one byte for a phase, then one bit, then half a bit, and you can't save anything meaningful.

The way around that is to have an ultimate phase, that isn't organized by time, but instead is just "everything before". When the penultimate phase overflows, you still take two parts and toss half, but the one remaining part, you need to reconsider the whole of that plus the ultimate phase to go from n+1 parts back to n. The ultimate phase is smaller than all phases combined, so it isn't as hard. And the ultimate phase probably has its own internal structure to make the revisions easier than a complete replacement.

Cleaning the attic

Cleaning your attic is really the identical problem to this. What do you keep, what do you throw out, and how do you organize it so you spend as little time cleaning the attic as possible? These strategies suggest putting things in boxes and labeling boxes by year and phase, and knowing how many boxes in each phase are allowed.

For example, if you can store 64 boxes, have 32 boxes for the first phase, 16 for the second, 8 for the third, 4 for the fourth, and 4 for everything else. If everything is full and you try to add another box, you get 32+1 in the first phase which becomes 31 plus a second-phase box. (You should combine the two oldest boxes in the firt phase, or the boxes you know you care the least about, rather than the most recent box.) Then you go from 16+1 in the second phase which becomes 15 plus a third-phase box. Then you go from 8+1 in the third phase to 7 plus a fourth-phase box. Then you go from 4+1 in the fourth phase to 3, plus a final box. And you have to consider all 4+1 final boxes and reduce that to just 4 again. The next time you have to add a box, you're going from 31 to 32 in the first phase and don't have to reorganize anything. It'll be 16 more boxes before you have to deal with the final phase again.

If you want to minimize attic-cleaning sessions, when a phase overflows, eliminate that phase entirely rather than just eliminating two boxes from it. That's more work per session but fewer sessions.

Journaling Filesystems

This comes up in computer storage frequently. A standard solution is called a "journaling filesystem" or "log structured file system". It has the behavior that when you modify something, you really append a change record to the latest bucket saying "modify it this way", rather than finding the actual thing and modifying it.

As data comes in, it is written to disk in time-based buckets. After n buckets have been filled (each representing a time range), they are sorted by subject alphabetically and rewritten into buckets segmented in alphabetical order, and these buckets are in a second phase. Note that these buckets also have a time range: the range from the oldest to the youngest bucket just combined.

When the next phase gets full, all buckets for a particular alphabetic range are combined, sorted alphabetically, and written to the next phase in buckets with smaller alphabetic ranges. Note that they still have time ranges too, spanning from the oldest to the youngest data that was combined.

Eventually there is a final phase whose time range goes back to the beginning of time. When you combine with that phase, you don't write to the next phase, you rewrite that final phase instead.

As this goes along, the earliest phases are split very finely by time, while the later ranges are split very finely alphabetically. Typically this system doesn't have to worry about discarding data when going from one phase to another: part of the data coming in all the time is modifications saying "delete this old data", so when you combine that with old data, both the old data and the "delete this old data" instruction gets deleted. This is called "garbage collection".

Suppose you want to see the current state of something. There's a series of updates that have come in over time. You can look up alphabetically which bucket in each range contains what you are looking for, and combine all those buckets. You don't need to combine the whole buckets, you only have to look up in each bucket the thing you're looking for. The number of lookups you have to do is the number of phases that the computer system keeps.

Bob Predicts the Future

Table of Contents