30 octobre 2023
Ce document est lié à :
info:eu-repo/grantAgreement//101004746/EU/Grant Agreement number: 101004746 — Polifonia — H2020-SC6-TRANSFORMATIONS-2018-2019-2020 / H2020-SC6- TRANSFORMATIONS-2020/Polifonia: a digital harmoniser for musical heritage knowledge
info:eu-repo/semantics/OpenAccess
Enrico Daga et al., « D2.6: Ontology of licencing, ownership and conditions of use (V1.0) », HAL SHS (Sciences de l’Homme et de la Société), ID : 10670/1.a8e4cc...
In research workflows under the paradigm of Open Science (standing for reproducibility of research, open access to knowledge, and societal responsibility of research) licences play an increasing role. With digitisation and automatic information processing, licences become important to also to guide the actions of machines, for example, in supporting the exploration and selection of resources and auditing their fair reuse. In the context of Polifonia we deal primarily with licences which come with content provided in the public sphere by cultural heritage institutions. But, we are also dealing with other source material: for instance information scrapped from websites, and we produce and re-use software which also comes with a licence, such as the resources catalogued by the musoW registry of musical resources on the Web. There are various issues when it comes to licences: - there is a large variety of licences and copyright statements used in the domain of musical content - the information about licences is not always added to metadata or not added in a standardised way, but often ’hidden’ in plain text on websites - licences regulating the access to and use of a webservices (e.g., repositories) and licences regulating the access and use of content provided via webservices (e.g. datasets in a repository) are kind of entangled - there might be various, sometimes contradicting each other, licence information available for a certain data collection. In this deliverable, we focus on the problem of extracting licence information from Web resources. More specifically, we look into the coverage of licence metadata in data registries, such as musoW a catalogue in which all main data components used by Polifonia are registered, next to a large number other sources. We set up piplines to check for licence information, and where possible to enrich it, text-mining the original websites/soruces to which the cataloguerefers. We do so with the aid of Large Language Models (LLM). LLMs are receiving increasing attention in numerousapplications, including knowledge extraction, but little work has been done so far in extracting and linking licenceinformation with help of them. Working with semantic web principles as our core technology means, we are inparticular devoted to design workflows where licence information can be turned into structured data (best expressedas so-called semantic artefacts); expressed in form of ontologies and knowledge graphs. As a result, we developiterative workflows where LLM use is combined with querying structured information as coded in ontologies andknowledge graphs.We depart from the source material we use in Polifonia and start with an overview about the Polifonia datasetswith the aim to define our problem space (Chapter 2). We devote an entire chapter to discuss related work (3)through which we render the possible solution space. Here, we briefly summarise the current discourse amongthose who further detail rules for FAIR implementation (Section 3.1); we describe the current state of art if it comesto the knowledge representation for licences and terms of use (Secion 3.2). We give an overview of the prominentapproaches to licence expressions on the Web: MPEG, CC-REL, and ODRL. The latter we further address inSection 3.3. We further touch upon the problem of reasoning with licences on the Web (Section 3.4). We close thischapter with a description of the use of Large Language Models - as we will apply them later on in our workflows (3.5).Chapter 4 concerns a specific workflow how to extract licence information from web resources with a Large LanguageModel (LLM). Here, we concentrate on the musoW resource which entails many relevant resources including allPolifonia data components which are also registered in the Polifonia Research Ecosystem. The workflow leads toan enrichment of the original licence information available in the musoW catalogue. Chapter 5 deals with the taskof Knowledge Graph Construction. It entails our design to formalise the extracted information and align it to existingknowledge graphs, and in particular how to best integrate the data into a Licences Knowledge Graph, a core outputof this deliverable. As one result we also gain a deeper insights in which licences are used in the wild , meaning inthe practices of many musicologists and music documentalists. Therefore, the chapter is followed by an Evaluation(6), complemented by the Polifonia Fair Section (7)- the check list if we oblige to the FAIR guides agreed upon inPolifonia. We conclude the deliverable with Chapter 8.The implications of the work done in this deliverable (D2.6), for the further development of the Polifonia Ecosystem,in particular, how these results influence how we treat licences in the Polifonia Research Ecosystem (and the relatedframework), will be discussed and reported in the Final Data Management Plan (D7.3)