Labeled Entities from Social Media Data Related to Avian Influenza Disease

Fiche du document

Date

20 octobre 2021

Discipline
Langue
Identifiant


Sujets proches En

Tools Hand tools Handtools

Citer ce document

CAMILLE SCHAEFFER et al., « Labeled Entities from Social Media Data Related to Avian Influenza Disease », Recherche Data Gouv, ID : 10.15454/GR5EFS


Métriques


Partage / Export

Résumé 0

This dataset is composed of spatial (ie. location) and thematic entities (ie. disease, symptoms, virus) concerning avian influenza in social media textual data, in English. It was created from three corpora: - The first one is composed of 10 transcriptions of YouTube videos and 70 tweets annotated manually by an annotator. - The second corpus is composed of the same textual data as corpus 1 but annotated automatically with Named Entity Recognition (NER) tools. These two corpora are create to do an evaluation of the NER tools and apply them to a larger corpus. - The third corpus is composed of 100 transcriptions of YouTube videos automatically annoted with NER tools. The aim of the annotation task was to recognize spatial information, as the name of cities and epidemiological information, as the name of diseases. An annotation guideline was created in order to have an unified annotation and help the annotators. This dataset can be used to train or evaluate natural language processing approaches such as specialized entity recognition.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en