Arsevska E., Roche M., Hendrikx P., Chavernac D., Falala S., Lancelot R., Dufour B., 2016. Identification of terms for detecting early signals of emerging infectious disease outbreaks on the web. Computers and Electronics in Agriculture, 123: 104-115. Doi: 10.1016/j.compag.2016.02.010
ESA platform: www.plateforme-esa.fr
Mathieu Roche
Land, Environment, Remote Sensing and Spatial Information (UMR TETIS)
Renaud Lancelot
Control of Emerging and Exotic Animal Diseases (UMR CMAEE)
Montpellier, France
05/2016
Animal disease surveillance, especially early detection of emerging disease outbreaks worldwide, is one of the means of preventing diseases from being introduced into France. Against this backdrop, CIRAD, ANSES and the French Directorate General for Food have created a web-based automatic disease surveillance system within the national epidemiological surveillance platform for animal health. The system, which has been under development since 2013, retrieves textual data, extracts relevant information from it and reinterprets that information as spatiotemporal series and maps. Five tropical animal diseases are currently being monitored, but others could easily be integrated.
Within the framework of international disease surveillance under the national epidemiological surveillance platform for animal health (ESA), since 2013, CIRAD, ANSES and the French Directorate General for Food have been developing an automatic surveillance system for information published on the internet. This system retrieves epidemiological reports from unofficial sources on a daily basis, including electronic media, then automatically extracts information from these – names of diseases or symptoms, places, dates and species affected – and reinterprets this information in summarised, aggregate form as maps and spatiotemporal series.
The surveillance system is a reactive, interactive and accurate tool. It supplements official sources from the World Organisation for Animal Health (OIE) and the Food and Agriculture Organization of the United Nations (FAO).
The retrieval of reports is based on queries launched on Google News articles based on different combinations of keywords: names of diseases, hosts and symptoms. These keywords are defined by experts or by browsing terms provided by the BioTex tool developed as part of the Semantic Indexing of French Biomedical Data Resources (SIFR) project (www.lirmm.fr/sifr).
This tool uses two factors to automatically extract terminology from textual material. As a first step, it extracts terms according to predefined syntactic structures (noun-adjective, adjective-noun, noun-preposition-noun, etc.), Then, after this linguistic filtering, it applies statistical filtering, which measures the association between the words that make up a term. Finally, recent research conducted at CIRAD consists in weighting results according to data sources.
Each article retrieved on the basis of keywords selected by experts is pre-treated and standardised (removal of tags, language recognition, etc.) before being stored in a database. A web interface is used to adjust the settings of the retrieval process and to browse articles retrieved.
Finally, information is extracted from reports retrieved by identifying key elements: names of diseases, places, dates, number of animals affected and species. This extraction is based on special dictionaries and rules automatically developed by data mining methods.
The first results, taken from a corpus of 357 reports, are accurate and reliable. They have accuracy rates of around 70% for spatial information, and at least 80% for other types of information.
The information thus extracted from reports will shortly be compared to official OIE data in order to highlight the complementarity of systems and the added value of web-based disease surveillance for earlier detection of emerging animal diseases.
Currently, this system is used to monitor five exotic animal diseases that are a potential threat to animal health in France and in the countries of the South: African swine fever, avian influenza, bluetongue disease, foot and mouth disease and Schmallenberg virus.
This system is generic and can therefore be adopted for other diseases. It will be used by the ESA platform for France, and by the Caribbean animal health network, CaribVet.