Research Archive

Observations on Retrieval Consistency

The concept of retrieval consistency is somewhat under-discussed in the small literature on personal archives. By retrieval consistency I mean the property that an entry, once located through one path of access, can be located again through that same path — and ideally through several other paths — without the intervention of accident.

Retrieval consistency is a property not of the entry itself but of the relationship between the entry and the various conventions (identifiers, titles, tags, dates, links) through which the entry can be reached. It is, in this sense, an emergent property of the archive as a whole.

Sources of inconsistency

The most common source of retrieval inconsistency, in my own experience, is the silent revision of metadata. An entry that was originally tagged with one term, and is later re-tagged with a different term, will no longer respond to a query under the original term. A reader who relied on the earlier query has been silently disappointed. The inconvenience is small, but the cumulative effect across many such revisions is corrosive.

A second source is title revision without preserving the prior title. If an entry's title is revised, and the prior title is not preserved as an alias, the entry becomes effectively unreachable through searches that used the prior title.

A third source, less common but more striking, is the resurrection of recycled identifiers. If an entry is removed and its identifier is later assigned to a different entry, every prior reference to the old identifier now points to the wrong place. This particular failure mode is severe enough that I avoid removing entries entirely; entries deemed obsolete are marked so but retained in place.

A practical convention

A simple convention I have adopted is that the original tags, title, and identifier of an entry are recorded once, at creation, and are never altered subsequently. Revisions to title or tagging are expressed by appending new aliases or new tags, never by replacing existing ones. The cost of this convention is that the archive accumulates a small amount of historical metadata that is no longer the canonical view of the entry. The benefit is that older references to the entry continue to work.

Whether this trade is worth making depends on the rate at which external references accumulate to the material in the archive, which in turn depends on the archive's age and the breadth of its correspondence reach. For a small archive in active use, the trade appears, to me, clearly worthwhile.

References

  1. Salton, G., & McGill, M. (1983). Introduction to Modern Information Retrieval. McGraw-Hill.
  2. Manning, C., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.
  3. Berners-Lee, T. (1998). Cool URIs don't change. W3C Style.