The ISIDORE project proposes a unified access to the data of the search in Human and social Sciences.
To allow this access to the data of the search ISIDORE makes a targeted harvesting of data produced by the scientific community.
After this phase of "capture" of the data, ISIDORE makes various processings allowing to enrich the collected data.
How are made these enrichments?
ISIDORE uses vocabularies to enrich the data: the vocabulary HAL, the index of the thematic categories of Open Edition, the vocabulary Rameau, thesaurus Pactols, GEMET, GéoEthno and finally the geographical vocabulary Geonames. Several enrichments are made including the "categorization" of data which allows to link data with a discipline or a scientific theme and an addition of terms stemming from various vocabularies.
To realize these enrichments, ISIDORE exploits metadata of the resources as well as the full text, by analyzing these data to link them with vocabularies. The present elements in the full text or metadata are compared with entries vocabularies through an algorithm based on a morphological analysis of the terms.
If an equivalence is made between a term stemming from the resource and an entry of one of the vocabularies then the resource will be linked with the entry of the vocabulary.
Disciplines HAL (MORESS)
The available scientific disciplines in HAL for SHS are from a European vocabulary, European University Association (EUA) built under the project MORESS - Mapping of Research in European Social Sciences and Humanities.
This is a simple and readable nomenclature by researchers, to improve access to information on research in the social sciences and humanities. This vocabulary is available at URL halshs.archives-ouvertes.fr/browse/domain
The OpenEdition index is composed of thematic categories covering all the arts and the human and social sciences. It is developed in the context of electronic publishing platforms and scientific communication OpenEdition
's portal. At first used and enriched for the announcements of scientific events on Calenda
, calendar of letters and human and social sciences, it is now used by journals, book collections and research notebooks for Revues.org
and for Hypothèses
. Designed to represent the wealth of topics and objects of human and social science research in a single hierarchical index, it is divided into four main categories: societies, mind and language, periods, spaces. The OpenEdition index is available here: http://www.openedition.org/6554
GEMET, the GEneral Multilingual Environmental Thesaurus, has been developed as an indexing, retrieval and control tool for the European Topic Centre on Catalogue of Data Sources (ETC/CDS) and the European Environment Agency (EEA).
GEMET was conceived as a “general” thesaurus, aimed to define a common general language, a core of general terminology for the environment.
GeoEthno is a geographical thesaurus designed for geographic indexing of documents in the field of ethnology. Currently in development in the library Eric de Dampierre of the Laboratory of Ethnology and Comparative Sociology, it is used for indexing and querying library's database and more largely on the ethnology network database.
Its coverage is irregular and not exhaustive. This thesaurus has been built from the corpus of geographical keywords accumulated since the computerization of the library in 1985, which included approximately 2,000 words. The corpus was cleaned and enriched as a result of work on atlases and reference lists. Simple list of keywords at the beginning, it was organized in structured list thanks to the creation of a DTD (Document Type Definition ). It contains about 15,000 terms. It is built around a list of names of countries and territories (ISO 3166-1: 1997: Codes for the representation of names of countries and their subdivisions Part 1, Country Codes - List in French.) and of the cutting of the macro-geographical regions of the Statistics Division of the UN (Classification "m49").
The GeoNames geographical database is available for download for free under a Creative Commons license. It contains over 10 million geographical names and over 8 million unique characteristics 2.8 million populated places and 5.5 million other names. The data can also be accessed via webservices.
GeoNames integrates geographic data such as place names in different languages. All coordinates lat / long are in WGS84 (World Geodetic System 1984). Users can manually edit, correct and add new locations in using a user-friendly wiki interface.
The PACTOLS (acronym of Peoples, Anthroponyms, Chronology, Place-names, Works, Places and Topics) is a specialized thesaurus in archeology and in sciences of the Antiquity. The archaeology it means since the Prehistory to World War II, and includes all the sciences necessary to the study and conservation of its objects, human paleontology, natural sciences, physics and chemistry, etc. Another area concerned by the PACTOLS are the sciences of Antiquity: from writing to the year one thousand and in all its aspects. Professional party, it allows you to manage multilingual thesauri and it is written in java in client-server mode.
The OPAC part, it allows to consult thesaurus via internet and it is written in JSP.
The PACTOLS are a poly-hierarchical thesaurus consists of six micro-thesauri, multilingual (French base translated into English, German, Spanish and Italian), scalable and autonomous.
Dynamic and evolutionary because thesaurus constantly enriched and updated "with the current": the terminology reflects the evolution of the research and the participating centers to the network. It is semantically managed by the network FRANTIQ (GDS 3378 of the InSHS of the CNRS). When new teams join the Federation FRANTIQ, their themes are included in the thesaurus. The new subjects are validated by researchers, domain experts.
Autonomous: the computer scientist of FRANTIQ created a software management and retrieval (LGRD) with free tools. It is autonomous and can be imported into many applications (archives of the House Archaeology and Ethnology, journal AdlFI "Archaeology of France Informations", etc.). This application is interoperable because exportable in SKOS, thanks to funding from the TGE Adonis.
The thesaurus is available in its OPAC form ; the software and the PACTOLS are the object of Creative Commons license. For readers accustomed to the paper, the hierarchical lists PACTOLS are downloaded from the site FRANTIQ, after registering on the site.
Software management and retrieval (LGRD) is OpenTheso which fits the ISO 25964-1:2011.
The current version of OpenTheso consists of two parts:
The vocabulary RAMEAU (Directory of authority-material encyclopaedic and alphabetical unified : Bibliothèque Nationale de France) is a language indexing material that covers all areas of knowledge and applies to all types of papers on all types of media . This documentary language is used, in France, by the Bibliothèque Nationale de France, the university libraries, as well as of numerous libraries of public reading or research.
The core of authority record RAMEAU is trained by common nouns (approximately 100.000) and of geographical names (approximately 50.000). This is a controlled hierarchical vocabulary that associates themes by semantic relations (generic terms / specific / associated)
For more information: rameau.bnf.fr
The concepts of RAMEAU were converted into RDF SKOS language, as part of the European project TELplus. Each concept, designated by a permanent URI, is provided with labels (preferred or alternative), of diverse notes, but also of semantic links to other concepts RAMEAU (generic concepts, associated concepts) and towards external repositories (LCSH, DNB ). This vocabulary is maintained up to date on the site data.bnf.fr.
For more information: data.bnf.fr/semanticweb
Library of Congress Subject Headings (LCSH) has been actively maintained since 1898 to catalog materials held at the Library of Congress. By virtue of cooperative cataloging other libraries around the United States also use LCSH to provide subject access to their collections. In addition LCSH is used internationally, often in translation. LCSH in this service includes all Library of Congress Subject Headings, free-floating subdivisions (topical and form), Genre/Form headings, Children's (AC) headings, and validation strings* for which authority records have been created. The content includes a few name headings (personal and corporate), such as William Shakespeare, Jesus Christ, and Harvard University, and geographic headings that are added to LCSH as they are needed to establish subdivisions, provide a pattern for subdivision practice, or provide reference structure for other terms. This content is expanded beyond the print issue of LCSH (the "red books") with inclusion of validation strings.
The data are derived from bibliographic catalogs and authorities of the National Library of Spain.
The data corresponding to certain elements (authors, subjects, works ..) are enriched with links to their equivalents in other data sources. For the authors, links are provided, if available, with the Library of Congress, the National Library of Germany, the National Library of France, Sudoc, the National Library of Sweden, VIAF and ISNI. The notices authorities materials, geographical and genre / form are related to their equivalents of the Library of Congress.
ISIDORE uses only the subjects authorities National Library of Spain available in SKOS.
Le thésaurus ArchiRès a été élaboré par le réseau documentaire des écoles d’architecture. Il constitue un outil de travail pour les indexeurs (documentalistes, bibliothécaires) et pour les usagers des centres de documentation en architecture. Il permet une indexation globale et homogène de l’information reçue.
Il permet également de retrouver des références dans le portail ArchiRès qui donne accès à un catalogue de recherche documentaire commun aux bibliothèques d’écoles nationales supérieures d’architecture et de paysage du ministère de la Culture, avec à ce jour 400 000 notices bibliographiques, 200 titres de revues spécialisées dépouillées.
Simple liste de mots-clés à l’origine, il a été organisé en thésaurus structuré à partir de 2006. Outil vivant et évolutif, le thésaurus ArchiRès s’enrichit continuellement au fil des années grâce au travail d’une commission composée de documentalistes du réseau des écoles d’architecture et chargée de sa mise à jour. Sa terminologie reflète donc l’évolution des différents domaines qui constituent l’enseignement de l’architecture.
Il réunit à l’heure actuelle 2290 termes préférentiels et 1366 termes non préférentiels, soit un total de 3656 termes. Une version bilingue en anglais est prévue très prochainement.
Le thésaurus est géré sur l’application GINCO (Gestion Informatisée de Nomenclatures Collaboratives et Ouverte) développée par le Ministère de la Culture et qui permet la conception et la gestion au fil de l’eau de listes d’autorités et de thésaurus en s’appuyant sur les normes et standards les plus récents : la norme ISO 25964 et le langage SKOS.