Science at work 4 July 2022
Detecting the emergence of the next disease X
On 31 December 2019, health officials in Wuhan, China, reported a cluster of 27 cases of a "pneumonia of unknown cause". The same day, PADI-Web and HealthMap identified several online articles referring to a "mystery disease". ProMed, for its part, had detected the same type of vocabulary in the online media the day before the official notification from China.
These three artificial intelligence systems are classed as "EBS" (Event-Based Surveillance). Each day, they review hundreds of thousands of online articles to monitor the emergence or spread of certain diseases. CIRAD's Mathieu Roche, one of the authors of the article in Transboundary and Emerging Diseases , compares his work as a data miner to that of a gold digger: "We rummage through vast amounts of data, so we need to be able to sort useful information from useless information efficiently. Our surveillance systems act as a sieve to sort the flakes of gold from the grit. Our aim is to find the nuggets, which in our case are the weak signals of the emergence of a disease".
A vocabulary centring on "mystery" and "pneumonia"
Certain surveillance systems focus on existing diseases such as Ebola or African swine fever. However, for new diseases, researchers use "syndromic" RSS feeds. "The aim is no longer to target a specific disease", Mathieu Roche explains. "We look more for keywords relating to symptoms, mystery phenomena or signs of concern."
For Covid-19, the multidisciplinary work done by Mathieu Roche and Renaud Lancelot's teams pinpointed a vocabulary centring on "mystery" and "pneumonia" as the disease emerged. "Prior to formal identification of the virus, we find articles about a 'mystery disease' or 'pneumonia of unknown cause'", Mathieu Roche adds. "Subsequently, once the medical profession gets involved, more technical terms are used."
"Knowing more about the vocabulary used depending on the stage of disease evolution would enable us to improve our surveillance systems", Mathieu Roche says. "The more we are able to pinpoint a specific vocabulary, the more precise identification will be. It's as if we were using a finer sieve."
The researchers hope that their retrospective analysis will serve to build even more effective surveillance systems in future.
This research was conducted as part of the EU MOOD project, which set out to harmonize health surveillance in Europe. MOOD uses "model diseases" (see box below), classed according to how they are transmitted. Covid-19 is currently a model for surveillance of as yet unknown diseases, called X.
Diseases monitored by the EU MOOD project
• Unknown pathogens (disease X), which are a challenge for any epidemic surveillance system;
Sarah Valentin, Alizé Mercier, Renaud Lancelot, Mathieu Roche, Elena Arsevska. Monitoring online media reports for the early detection of unknown diseases: insights from a retrospective study of COVID-19 emergence