Avian Influenza events from different digital surveillance tools

Fiche du document

Date

4 janvier 2023

Discipline
Type de document
Langue
Identifiant


Sujets proches En

Standard of value

Citer ce document

Arınık Nejat et al., « Avian Influenza events from different digital surveillance tools », Recherche Data Gouv, ID : 10.57745/Y3XROX


Métriques


Partage / Export

Résumé 0

This dataverse contains all necessary input files in order to extract a normalized epidemiological event dataset from raw event data and then to evaluate the obtained normalized event dataset. Our raw event data correspond to Avian Influenza events affecting bird species from 2019 to 2021, collected by three sources: PADI-web, ProMED and EMPRES-i. in our context, we define an epidemiological event as the detection of the virus at a specific date and time and in a specific location. On the one hand, Indicator-Based Surveillance (IBS) refers to structured data collected through official routine surveillance systems. EMPRES-i is such an example of this surveillance. We use the EMPRES-i data as a ground-truth in our evaluation. On the other hand, Event-Based Surveillance (EBS) refers to unstructured data gathered from sources of intelligence of any nature, which can be either official (e.g. veterinary reports) or unofficial (e.g. news articles) sources. Moreover, the existing EBS tools are also categorized into three categories: 1) moderated (i.e. human-curated), 2) partially moderated and 3) fully-automated. PADI-web is fully-automated and relies only on news articles, whereas ProMED is a human-curated system that relies on a network of experts worldwide who detect epidemiological information from official and unofficial sources. The datasets in this repository are used in our work to evaluate and to compare a set of EBS tools of different nature in order to identify their strengths and weaknesses in Epidemic Intelligence. We invite interested readers to read our paper: N. Arınık & R. Interdonato & M. Roche & M. Teisseire, "An Evaluation Framework for Comparing Epidemic Intelligence Systems," in IEEE Access, vol. 11, pp. 31880-31901, 2023, doi: 10.1109/ACCESS.2023.3262462. Briefly, we perform in our work the evaluation of a given set of EBS tools in terms of four aspects: 1) spatial analysis (how the events are geographically distributed), 2) temporal analysis (how the events are temporally distributed), 2) thematic entity analysis (what thematic entities are extracted from the events and how they are related to spatio-temporal analysis) and 4) news outlet analysis (what news sources play key role in epidemiological information dissemination). For each aspect, we also propose an appropriate visualization for end-users. Our code for obtaining a normalized event dataset from raw event data is publicly available online on https://github.com/arinik9/epidnews2event (it uses the files "raw_event_data.zip" and "eval_event_data.zip"). Our code for evaluating normalized event datasets is publicly available online on https://github.com/arinik9/compebs (it uses the files "normalized_events.zip" and "eval_event_matching.zip"). Note that the structure of an event dataset (e.g. "normalized_events/events/padiweb/events.csv" in "normalized_events.zip" ) is as follows: id: Event identifier article_id: Article/report identifiers of a given event in the considered EBS/IBS system. Note that multiple article can report same event. url: URL information of the news articles reporting the considered event. Note that multiple article can report same event. Available only for PADI-Web and ProMED. source: Name of the news outlet reporting the considered news article. geonames_id: GeoNames identifier for the spatial entity. geoname_json: Raw GeoNames geocoding result for the spatial entity. loc_name: Place name of the spatial entity. loc_country_code: Associated country code for the spatial entity. continent: Associated continent information for the spatial entity. lat: Lattitude for the spatial entity. lng: Longitude for the spatial entity. hierarchy_data: All GeoNames identifiers higher up in the hierarchy of the spatial entity. published_at: Article/report publication date. disease: Disease information with associated hierarchy. host: Host information with associated hierarchy. day_no: Day value of the publication date. week_no: Week value of publication date. biweek_no: Bi-week value publication date. month_no: Month value of publication date. year: Year value of publication date. season: Season value of of publication date.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en