FAIR Vocabularies in Population Research: report of the IUSSP-CODATA Working Group on FAIR Vocabularies

Fiche du document

Date

janvier 2023

Type de document
Périmètre
Langue
Identifiants
Relations

Ce document est lié à :
info:eu-repo/semantics/altIdentifier/doi/10.5281/zenodo.7818156

Collection

Archives ouvertes

Licences

http://creativecommons.org/licenses/by/ , info:eu-repo/semantics/OpenAccess


Sujets proches En

Conation Volition Cetanā

Citer ce document

George Alter et al., « FAIR Vocabularies in Population Research: report of the IUSSP-CODATA Working Group on FAIR Vocabularies », HALSHS : archive ouverte en Sciences de l’Homme et de la Société - notices sans texte intégral, ID : 10.5281/zenodo.7818156


Métriques


Partage / Export

Résumé 0

This report describes the role of controlled vocabulariesin the documentation and dissemination of demographicdata in the light of the FAIR principles that all datashould be “Findable, Accessible, Interoperable, andReusable” by both humans and machines (Wilkinson etal., 2016). Population research is an empirically focusedfield with a long tradition of widely shared, easilyaccessible, data collections. The FAIR Principles pointto ways that this tradition can be enhanced by takingadvantage of emerging standards and technologies.Our work builds on the “Ten Simple Rules for makinga vocabulary FAIR” (Cox et al., 2021), prepared by agroup formed at a workshop convened by CODATA andDDI to describe how a FAIR vocabulary will work withinternational standards for documenting and sharingsocial science data.Controlled vocabularies play a central role in datasharing by associating data with concepts and bydefining which categories or codes may be applied.FAIR vocabularies specify globally accessible persistentidentifiers to distinguish data items that are the samefrom those that are different. Consider the most basicvariable in demographic analysis: age. The Organizationfor Economic Cooperation and Development (OECD)has a list of 643 age categories, while the UN PopulationDivision copes with more than 1100 age groups. If themeanings of variables in a dataset are only availablethrough human-readable documentation, like a pdf,harmonizing data from two providers will remain atedious manual process. However, if the age categoriesare linked to persistent identifiers in machine actionablemetadata, software can be programmed to harmonizeage groupings. If these operations are performedacross dozens of variables in hundreds of data sources,enormous amounts of human time will be saved.Construction of the infrastructure for FAIR data hasbegun. Demographic concepts are already includedin vocabularies developed by other disciplines, likemedicine, with definitions that conflict with usage inpopulation research. Therefore, there is a need fora FAIR vocabulary of demographic conceptsendorsed by an authoritative institution in thefield of population science.IUSSP has a long history of working with the UNand other agencies to define demographic concepts(International Union for the Scientific Study ofPopulation, 1954; Vincent, 1953). Those efforts currentlyexist in electronic forms (Demopædia and Demovoc)that provide a base for a multilingual FAIR Vocabularyof Demography. We argue that a FAIR Vocabularyof Demography will have important benefits for thepopulation research community represented by IUSSP,and we conclude with recommendations for IUSSP andother important organizations.In addition to summarizing the activities of the WorkingGroup, this report is intended to serve as an introductionto the standards and infrastructure used to share socialscience data. Most demographers have never heard ofURIs, SDMX, or DDI, even though they use servicesfrom the UN, ILO, OECD, CESSDA, IPUMS, andother organizations that depend on these standards.Understanding key features of the international datainfrastructure will help IUSSP leadership to influenceits development.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets