janvier 2023
HALSHS : archive ouverte en Sciences de l’Homme et de la Société - notices sans texte intégral
Ce document est lié à :
info:eu-repo/semantics/altIdentifier/doi/10.5281/zenodo.7818156
http://creativecommons.org/licenses/by/ , info:eu-repo/semantics/OpenAccess
George Alter et al., « FAIR Vocabularies in Population Research: report of the IUSSP-CODATA Working Group on FAIR Vocabularies », HALSHS : archive ouverte en Sciences de l’Homme et de la Société - notices sans texte intégral, ID : 10.5281/zenodo.7818156
This report describes the role of controlled vocabulariesin the documentation and dissemination of demographicdata in the light of the FAIR principles that all datashould be “Findable, Accessible, Interoperable, andReusable” by both humans and machines (Wilkinson etal., 2016). Population research is an empirically focusedfield with a long tradition of widely shared, easilyaccessible, data collections. The FAIR Principles pointto ways that this tradition can be enhanced by takingadvantage of emerging standards and technologies.Our work builds on the “Ten Simple Rules for makinga vocabulary FAIR” (Cox et al., 2021), prepared by agroup formed at a workshop convened by CODATA andDDI to describe how a FAIR vocabulary will work withinternational standards for documenting and sharingsocial science data.Controlled vocabularies play a central role in datasharing by associating data with concepts and bydefining which categories or codes may be applied.FAIR vocabularies specify globally accessible persistentidentifiers to distinguish data items that are the samefrom those that are different. Consider the most basicvariable in demographic analysis: age. The Organizationfor Economic Cooperation and Development (OECD)has a list of 643 age categories, while the UN PopulationDivision copes with more than 1100 age groups. If themeanings of variables in a dataset are only availablethrough human-readable documentation, like a pdf,harmonizing data from two providers will remain atedious manual process. However, if the age categoriesare linked to persistent identifiers in machine actionablemetadata, software can be programmed to harmonizeage groupings. If these operations are performedacross dozens of variables in hundreds of data sources,enormous amounts of human time will be saved.Construction of the infrastructure for FAIR data hasbegun. Demographic concepts are already includedin vocabularies developed by other disciplines, likemedicine, with definitions that conflict with usage inpopulation research. Therefore, there is a need fora FAIR vocabulary of demographic conceptsendorsed by an authoritative institution in thefield of population science.IUSSP has a long history of working with the UNand other agencies to define demographic concepts(International Union for the Scientific Study ofPopulation, 1954; Vincent, 1953). Those efforts currentlyexist in electronic forms (Demopædia and Demovoc)that provide a base for a multilingual FAIR Vocabularyof Demography. We argue that a FAIR Vocabularyof Demography will have important benefits for thepopulation research community represented by IUSSP,and we conclude with recommendations for IUSSP andother important organizations.In addition to summarizing the activities of the WorkingGroup, this report is intended to serve as an introductionto the standards and infrastructure used to share socialscience data. Most demographers have never heard ofURIs, SDMX, or DDI, even though they use servicesfrom the UN, ILO, OECD, CESSDA, IPUMS, andother organizations that depend on these standards.Understanding key features of the international datainfrastructure will help IUSSP leadership to influenceits development.