How and when should data decay?
December 21st, 2007
There’s a relatively new concept out there called “data decay” (there isn’t even a Wikipedia entry for it yet!) that deals with the process of how data gradually becomes incorrect and out of date over time and how it should be handled. Here’s one example from Two Sides to Data Decay:
Approximately 10 years ago, I lived in an apartment just outside of Boston. At the time, the ZIP code for the address of the building in which I lived was 02146. A few years after I moved, the U.S. Postal Service decided to split the area covered by that ZIP code into two parts. The southern section kept the 02146 code, while the northern area – where my former apartment building is – was assigned a new ZIP code: 02446
This example is interesting becomes it raises the two problems caused by this event:
- After he moved, the address that everyone had on file for him would then be incorrect.
- After he moved and after the zip codes were reassigned, his old address while correctly zoned at the time he lived there is incorrectly zoned any time after the reassignment. So processing of the original zip code performed after the zip code reassignments would either incorrectly include the new zip code when processing the original zip code’s area or would be skewed when comparing new data with the original zip code to old data with the same zip code since the old data for the original zip code included more area.
It’s actually a pretty surprisingly complex situation and I urge you to read the whole article. Bruce Schneier approaches data decay from a different perspective in the this Educause Podcast. He considers the problem of when computers should forget data. Is it relevant and safe for computers to remember things forever:
All process today produce data. It stays around. It festers. How we deal with it. How we recycle it, reuse it, dispose of it. What the regulations are concerning it are central to the Information Age… Twenty, thirty, fifty years from now we’re going to be cleaning up massive data problems just like we’re cleaning up massive pollution problems today… Some people have written about the fact that computers should be programmed to forget things. That remembering stuff forever isn’t necessarily goodness.
I’ve posted about backing up before and take great strides in keeping all my data safe and secure, but I think that Bruce has a point here. I don’t know if I want everything about me stored in some database forever. But at the same time how should data like that gracefully decay without simply vanishing or becoming useless?




