A Short Observation on Link Rot and Personal Archives
Link rot — the gradual decay of external references to material that has been moved, removed, or relocated — is a familiar problem for anyone maintaining a long-running corpus of written work. For personal archives the exposure is asymmetric: the archive's own URLs can be made stable through custodial discipline, but the external URLs it cites cannot.
The asymmetry is awkward. An entry in the archive that cites a substantial external work — a paper, a specification, a memorandum — inherits the fragility of the cited resource. If the cited URL disappears, the citation becomes a half-citation: the substance of the reference remains, but the reader's ability to verify it is diminished.
The scale of the problem
Empirical work on the persistence of cited URLs suggests discouraging numbers. A representative study of legal scholarship [1] found that roughly half of the URLs cited in articles published over a decade earlier had become inaccessible. The literature on academic citation more generally reports comparable figures, though with significant variation by domain [2].
For an archive whose explicit purpose is stable citation, the implications are uncomfortable. The archive's own conventions (opaque identifiers, custodial care of URL persistence) address the forward problem: they ensure that the archive's own entries remain findable. They do nothing about the backward problem: the references the archive itself depends on.
Practical responses
Three responses suggest themselves, none of them complete.
Citation through canonical identifiers where available. For published work, citing through DOI or ISBN rather than through a publisher URL gives the citation a chance of resolving even after publisher reorganisations. The cost is small; the benefit is modest but real.
Local snapshotting of cited material. When a citation depends on a URL that is unlikely to remain stable, a local snapshot of the cited content, kept in the archive itself, preserves the substance of the reference. This raises questions of attribution and permission that I will not address here. The convention I follow is to snapshot only public-domain or openly-licensed material, and to record the access date alongside the citation.
Acceptance of partial decay. The third response is to accept that some references will decay and that this is, ultimately, a property of the broader ecosystem rather than a problem the archive itself can fully solve. A reference that has decayed to half-citation is still more useful, in retrospect, than no reference at all.
The Internet Archive question
The Internet Archive's Wayback Machine is the most prominent external response to link rot, and it deserves comment. For citation recovery it is invaluable; an externally archived snapshot at the time of citation is, in effect, the same response as local snapshotting, at lower personal effort.
For the archive's own records I have nonetheless tended toward local snapshotting where the cited material is short, and toward relying on the Wayback Machine where it is long or where the licence makes redistribution awkward. The choice is operational rather than principled.
References
- Zittrain, J., Albert, K., & Lessig, L. (2014). Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations. Legal Information Management 14.
- Klein, M., Van de Sompel, H., Sanderson, R., et al. (2014). Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. PLOS ONE 9(12).
- Bush, V. (1945). As We May Think. The Atlantic Monthly.