Searching for infectious diseases with open source

ECDC developed epitweetr to find public health signals in the Twitter noise

Published on: 01/06/2021

The value and need for epidemiological threat detection requires little argument in 2021. The Covid-19 pandemic still has a profound impact on our lives and will likely continue to do so for quite some time. As such, when the Epidemic Intelligence team at the European Centre for Disease Prevention and Control (ECDC) started the development on a prototype of epitweetr, a tool designed to automatically monitor trends on Twitter in July 2019 that could be seen as inspired timing.

Laura Espinosa, Scientific Officer for Epidemic Intelligence at ECDC and maintainer of the tool, told OSOR that the initial prototype covered two known diseases as a proof-of-concept study so, unfortunately, it was not designed to detect Covid-19 through Twitter data. However, COVID-19 was added to the prototype in January 2021.

Epitweetr was published in October 2020 under the European Commission drafted open source license EUPL 1.2. When active, the tool monitors trends of tweets by time, place and topic aiming at detecting public health threats early. Espinosa clarified that the ECDC does not use the tool to monitor symptoms, but searches for specific diseases or health topics within the ECDC mandate. If a topic is mentioned more than usual in a specific time and place, the tool is designed to detect this overrepresentation and notifies the relevant person(s) from the epidemic intelligence team at ECDC who check the “signal” for significance and validity. Though designed for the purpose of detecting signals from infectious diseases, by modifying keywords the tool can be used to monitor the emergence of any topic on Twitter.

Screenshot of the epitweetr dashboard

Screenshot of the epitweetr dashboard, source: ECDC

Epitweetr collects tweets and metadata through the Twitter API and then extracts the geolocation of the tweet using a machine learning algorithm. Tweets are aggregated according to topic, time and geolocation for an algorithm to determine if the number of tweets for that given topic, time and geolocation exceeds the expected number. If that is the case, the tool sends out email alerts to investigators to follow up manually, in order to check the alerts. The tool also comes with a dashboard based on the open source program “shiny”, which helps visualise data, modify customisable settings and check the status of epitweetr tasks.

Community was an important aspect throughout the development. The ECDC strove to involve potential users in the development from the beginning and worked with a multidisciplinary group of experts in their fields, such as epidemiology, machine learning and statistics to increase the quality of each aspect of the tool. Using open source enabled easy collaboration between the different experts. A number of organisations also tested the tool throughout its development. Epitweetr is hosted on Github, where the community can report issues, fill out surveys and contribute to the development. Espinosa told OSOR that continuous evaluation and improvement was an important success factor in the development as well as having a maintainer in charge of organising the development. The ECDC team always searched for existing open source tools that could be reused, but found a gap where epitweetr could serve an important role.

The ECDC is an EU agency and thus performs tasks for the European Union as a whole and Member States in the area of surveillance and epidemic intelligence, among others. As such, it is an important part of this community. Thus, collaboration with Member States and other health agencies is part of the ECDC’s mission. Espinosa revealed that already two other institutions, the National Public Health of Italy, the Istituto Superiore di Sanità and the World Health Organisation (WHO) Eastern Mediterranean Office are using or testing epitweetr, though Espinosa said “other institutions are investigating how to integrate epitweetr in existing tools and processes”.

A next major version of epitweetr is planned for November 2021, with improvements to the geolocation algorithm, automated analysis of detected signals and improved data storage and management.