Data documentation & metadata
To be usable, a dataset (or other research data object) needs to be well-documented.
Documentation
It is good practice in research to ensure that all data generated or collected through the course of your research is easy to understand and analyse.
Producing good documentation and metadata ('data about data') provides context for your data, tracks its provenance, and makes it easier to find and use your data in the long term, or for others to discover on the internet.
Documentation and metadata requirements should be identified from the start of your project and considered throughout the lifecycle of your data. This is the essence of good 'data curation'.
Where a research protocol is used, much of the documentation needed will already exist. If instrumentation is used, calibration and other details need to be captured for the data to remain useful. Lab notebooks are perhaps the most rigorous form of documentation. Is it possible to put it into digital format?
For qualitative data or small-scale surveys, the documentation might exist only in your head. Take the time to write it down while it is fresh on your mind.
This may include writing methodology reports, creating codebooks with full variable and value labels, documenting decisions about software, tracking changes to different versions of the dataset, recording assumptions made during analysis.
Have you created a "readme.txt" file to describe the contents of files in a folder? Such a simple act can be invaluable at a later date.
Ultimately the amount of effort put into documenting your data will depend on the intended lifespan and how broadly you intend to share it.
Ideally these decisions should be made at the outset to avoid having to carry out a rescue mission on the data, sometimes known as 'digital archaeology', when a key member of staff leaves, or renewed interest in a topic suddenly puts a dataset in demand.
Metadata
Usually, metadata are standards-based and serve a particular purpose in data processing and machine-to-machine interoperability. Three broad categories of metadata are:
- Descriptive - common fields such as title, author, abstract, keywords which help users to discover online sources through searching and browsing.
- Administrative - preservation, rights management, and technical metadata about formats.
- Structural - how different components of a set of associated data relate to one another, such as tables in a database
© University of Edinburgh, 2011. Used with permission.




