On Metadata Conventions for Technical References
Each entry in the archive carries the same small set of metadata fields. The list is short, has been stable for some years, and is generally what I would suggest, with minor adjustments, to a colleague setting up a similar archive of their own.
The fields
Title. A short noun phrase, descriptive of the entry's subject. The title is intended to be retrievable from memory when the entry is half-recalled — that is, the title encodes the recognition handle for the entry rather than its precise content. Titles are not changed after entry, for reasons discussed in an earlier note.
Identifier. An opaque string, generated at random on entry, unchanged thereafter. The identifier is the canonical reference. Discussed at greater length elsewhere.
Publication date. The date on which the entry was first considered stable enough for archival listing. This date may, in some cases, lag the date on which the underlying observation was made — if so, the underlying date appears in the body of the entry.
Modification date. Present only when the entry has been substantively revised. Cosmetic changes do not generate a modification date.
Tags. A small list of terms drawn from a controlled vocabulary. Tagging is conservative: an entry is tagged with the smallest set of terms that will make it discoverable through the archive's topical indexes.
Author. Where the entry is authored by someone other than myself, an author field is recorded. Where the entry records a joint observation, the joint authors are listed in alphabetical order. The default, in the absence of an explicit author field, is that the entry is authored by me.
Fields not used
The archive does not record an abstract field, on the grounds that an abstract is sufficiently summarised by the title and the entry's opening paragraph for an archive of this scale. It does not record a topical hierarchy, only a flat tag set; topical hierarchies introduce re-classification problems that flat tag sets avoid. It does not record a status or workflow field; entries are either in the archive or they are not, and the distinction between draft and published is managed outside the archive's metadata.
It does not record reader-oriented metadata — citation count, reader view count, social signals — because the archive is not, in this sense, a published work, and the corresponding metadata would be either misleading or absent.
References
- Dublin Core Metadata Initiative (2020). DCMI Metadata Terms.
Available from
dublincore.org. - Lavoie, B. (2014). The OAIS reference model: introductory guide. Digital Preservation Coalition Technology Watch Reports.
- Berners-Lee, T. (1998). Cool URIs don't change. W3C Style.