Making the mountains of COVID-19 research publications accessible

Published on 25/06/2020

WHAT IS COLIBRI RESEARCH ABOUT?

The worldwide COVID-19 pandemic has resulted in mountains of information and research publications available worldwide concerning the disease, so how do you find the information you need? That’s where LIST’s Colibri research comes in, and its Principal Investigator Mohammad Ghoniem outlines what the project is about.

“What we aim to provide is support for researchers so that they can sift through massive publication collections concerning COVID-19, because everybody now is trying to help with the pandemic” began Mohammad.

Researchers around the world are racing against the clock to find treatments, vaccines and diagnosis methods for the COVID-19 disease. They can only succeed if they share and remain abreast of the latest developments concerning COVID-19. A major hurdle lies in the necessity to digest an abundance of scientific literature, which grows by the day. For example, the CORD-19 challenge dataset has about 59,000 related articles about coronavirus. The World Health Organization (WHO) maintains a fast-growing hand-curated publication database with about 24,000 entries to date relating to COVID-19. Exploring such large amounts of information is infeasible without appropriate visual text analytics software, combining the power of text mining algorithms with the flexibility of interactive visualizations.

Mohammad explained the different types of publications currently available. “These appear on the one hand, as early stage publications, which are published online in different databases even if they still have to undergo scrutiny and scientific review, but they try to share results as soon as possible,” he began, “so if you are a facing, say, a few thousand new papers that you have to read, this is not tractable by any person, so these people need effective visualization software to support them, and that’s exactly what we do, we provide the Papyrus tool which we built previously before the pandemic for different types of use, and this turns out to also be handy with COVID-19.”

One of the COLIBRI objectives is to make this tool freely available to researchers around the world and provide access to the relevant papers within the tool. An open access version of the Papyrus tool is preloaded with the latest WHO COVID-19 data set and usable online at Colibri.list.lu. Unlike search engines or faceted search, Papyrus provides a user-friendly overview of the contents and full drill-down capabilities into topics of interest.

“We don’t aggregate the papers ourselves; some people do that already for us, so we just download these previously aggregated sources such as WHO, one of these sources, but also scientific publishers provide complimentary databases, and other sources, so there are numerous sources. The challenge is always the same; the sheer volume of publications makes it difficult to exploit this data,” explained Mohammad.

Within a couple of clicks, the users of Papyrus are able to hone in on a few articles, usually 2-10 articles, addressing a specific question. The COLIBRI project aims to do this in two ways; to further improve biological relevance of results by leveraging biological annotations from the BioKB platform developed by the Bioinformatics Core team of LCSB as well as pattern mining algorithms developed by the Centre Borelli in Paris. And BioKB will benefit from text visualizations built into Papyrus and multilayer network visualizations developed jointly by the LaBRI laboratory in Bordeaux and LIST during the FNR INTER BLIZAAR project.

Mohammad explained who the end user of Colibri may be. “It could be, for example, bio-medical researchers, and we have such people at LCSB, one of the project partners, but we could also imagine people who are interested in public health, so decision makers who for example, have to manage hospital infrastructure or have to recommend best practices, for example how to manage teams of nurses and doctors during the COVID-19 pandemic, and typically aspects like anxiety that has been observed in these teams because they are the first line responders and they live with the pandemic more than us”.

The COLIBRI project is planned for worldwide use. “That’s why we put it on a public facing website; colibri.list.lu, and there is a guest login, so the plan is to advertise this platform on relevant websites so that we can attract researchers from around the world and maybe get feedback from them,” concluded Mohammad.

WHAT IS LIST SPECIFIC CONTRIBUTION?

LIST has a solid track record in visualization, in particular visual text analytics and network visualization. The Colibri project will reuse and extend two software assets developed at LIST, "Papyrus" to support the visual exploration of large document collections, and "BLIZAAR" for the exploratory analyses of biological network data.

WHAT IS YOUR ROLE WITHIN COLIBRI?

I am coordinating the activities of Colibri together with Venkata Satagopam from LCSB. In particular, I will oversee research and development activities pertaining to text and network visualization, and also foster synergies between all project partners.

ABOUT MOHAMMAD GHONIEM:

Dr Mohammad Ghoniem is a senior research and technology associate at Luxembourg Institute of Science and Technology. He received his doctorate in computer science from the University Of Nantes, France, in 2005. His main research interests include information visualization, visual analytics and explainable artificial intelligence. Over the past 20 years, he has designed, built and evaluated various data visualization and analytics software for a variety of application domains, such as network forensics, fraud detection in financial transactions, and since 2015 systems biology and microbiology. He was the principal investigator of the Hydviga project, which resulted in "iCoVeR" a visual analytics software supporting the exploratory analysis of massive multivariate data concerning bacterial population involved in the biomethanation process. He was also the national Luxembourg coordinator of the ANR/FNR PRCI BLIZAAR project (2016-2019) edicated to the visualization of multilayer dynamic networks with a use case in plant systems biology and a second one in digital cultural heritage.

Dr Mohammad Ghoniem is also leading the data visualization workpackage in the Goodyear-LIST collaboration, supporting data-driven decision making in the automative sector.