Metadata
Metadata refers to data that describes other data. In other words, it's information about the content, context, quality, and other characteristics of a dataset. Metadata can include details such as the dataset's title, author, date created, variable definitions, and data format.
Metadata is crucial for achieving FAIR data, which stands for Findable, Accessible, Interoperable, and Reusable. Without appropriate metadata, it can be difficult or impossible to find, understand, or effectively use a dataset. For example, if a researcher wants to locate data on a particular topic, they may rely on metadata to search for and identify relevant datasets. Similarly, metadata can help ensure that data are properly documented, formatted, and described, facilitating their use by other researchers. Metadata plays a critical role in enhancing the discoverability, usability, and overall value of research data.
Examples of research metadata
- A project readme containing the information below. Often in a
readme.txt
. Find an example template here or use the information below:- Creator (PI): name and affiliation of PI
- Title: project title
- Funding sources: names of funders, incl. grant numbers and related acknowledgements
- Data collector/producer: who is responsible for data collection + date and location of data production
- Description: project description, incl. relevant publications
- Sample and sampling procedures: target population and methods to sample it (or link to document describing this), retention rates for longitudinal studies
- Coverage: topics, time period and location covered
- Source: if relevant, citations to original source from which data were obtained
- Metadata for a specific data file, containing, for example, file description, data format, relationship with other files, date of creation and versioning information, etc. This can be a
readme.txt
or other filetypes, such asnameofdatafile.json
ornameofdatafile.xml
- A codebook (data dictionary), which specifies what all variables in your dataset mean.
- Question wording or meaning
- Variable text: question text or item number
- Respondent: who was asked the question?
- Meaning of codes: interpretation of the codes assigned to each variable
- Missing data codes, e.g.,
999
- Summary statistics for both valid and missing cases
- Imputation and editing: identify data that have been estimated or extensively edited
- Constructed and weight variables: how were they constructed
- Location in the data file: field or column location, if relevant
- Variable groupings: if you categorize variables into conceptual groupings
- Metadata in systems, such as a data repository. This type of metadata is often enforced and interoperable so that you don't have to manually create this type of metadata. The OSF plataform provides their own metadata profile.
Metadata standards
Metadata standards refer to the frameworks that provide guidelines for the metadata fields, defining the formatting of metadata fields to make them machine-readable and interoperable. An extensive range of metadata standards is available, varying across different disciplines. For the social sciences, the most widely known metadata standards are Dublin Core and Data Documentation Initiative (DDI). Dublin Core consists of basic elements for describing networked resources, such as Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, and Identifier, among others (check this metadata file generator to see all the elements). On the other hand, DDI is commonly used in social, behavioral, economic, and health sciences, including CESSDA (Consortium of European Social Science Data Archives). Researchers may not always need to work directly with these standards, but it is important to understand that different repositories may adopt different standards. More metadata standards can be found here.