Creating graphs is key way of digesting knowledge and obtaining a snap shot of evolution of the subject concerned. This is the focus of LIST’s Historical Knowledge Graph for COVID-19 project, or HKG4COVID for short.
The project’s Principal Investigator Cédric Pruski explains. “The project deals with the evolution we have on the Coronavirus described in scientific literature. So we have a lot of articles that contain knowledge and we extract knowledge from these texts and build graphs”.
Knowledge Graphs (KG) or, more generally speaking, Knowledge bases provide the basics to represent domain knowledge in a machine interpretable manner. This allows computers to consume this knowledge for various purposes, for example, information retrieval, knowledge discovery, and decision support.
“So that’s what several research groups all over the world are currently doing. You have a lot of approaches that mine information contained in the literature - extract some relations and concepts from that - then build a graph from it. In HKG4COVID we will add the temporal dimension to those graphs” explained Cédric.
There are countless COVID-19 research projects around the world that also produce graphs but they tend to amalgamate the most recent state of knowledge, meaning that the history of the disease (i.e. how our knowledge about the disease has evolved) is rarely preserved. In consequence, crucial information about the COVID-19 is lost. As a direct consequence, past knowledge described with outdated terminology/concepts can be extremely difficult to retrieve which is the case of scientific literature indexed with outdated ontology. The HKG4COVID project will construct historical knowledge graphs about COVID-19 based on existing knowledge graphs and millions of scientific articles.
“The graphs that we are building can be huge!” Cédric stated. “There are billions of entities there including concept terms, synonyms, relation between terms and so on, because we have millions of scientific articles that are published over the past years”.
Talking about the potential end users of HKG4COVID, Cédric said, “it depends on how far we go in this project but basically it would be researchers and the research communities.”
The HKG4COVID project does not sound too far removed from another LIST project Colibri, dealing with data and Cédric pointed out the connection, “Colibri is dealing with the visualisation of data, we are just one step before, so we are building the graph and then we can inject this graph into Colibri, then we have both – the creation, construction of that historical graph, and the visualisation”.
But LIST is not alone in the project and partners with the LCSB arm of the University of Luxembourg. “We partner with LCSB as they already have this kind of knowledge base representing concept terms from literature they offer, which is also publically available. So we start from their knowledge base and enrich our graph, dealing with the COVID-19 evolution”.
Of course HKG4COVID isn’t the only project researching this kind of COVID-19 information. “There are a lot of knowledge base projects focusing on Coronavirus all over the world so the idea would be also to connect these graphs with specific semantic relationships as well as evolutionary relationships to make this knowledge even bigger,” Cédric added.
He concluded by saying that the HKG4COVID project, “focuses mainly on the medical part but there are several other dimensions in Coronavirus. We have the social economic impact, environmental effects – because we don’t know yet if the climate has an influence on the disease - and so on. So if we manage to connect all the different data sets that exist all over the world with a specific aspect of the disease, that would also be a very big contribution”.
“The idea of the project and LIST’s contribution is to add the temporal dimension of the disease to graphs. Currently all the graphs that exist just represent current knowledge situation. We do not have the global understanding, and temporal evolution understanding of that knowledge over time, so this is what we will add.
The research contribution for HKG4COVID is 100% LIST as LCSB is there for just providing the data that they have collected. LCSB also provides support for validation. They have the medical knowledge, so we show them what we have extracted they will tell if our findings make sense from a medical perspective, and then we can add that to the data set”.
I’m the Principal Investigator of the project ensuring that the project is well-managed - my main role - but I also have to ensure that the collaboration with LCSB is going well, because that’s key for future projects. But I also participate in research tasks, so proposing ideas, validating them and so on - basic research I would say”.
Cédric Priski is a Senior Rsearcher at LIST. His interests are Artificial Intelligence and knowledge representation and reasoning. He received an “Habilitation à Diriger des Recherches” from University Paris-Saclay and a PhD in computer science from both University of Paris-Sud and University of Luxembourg. He is the co-author of more than 70 scientific articles and co-supervised four doctoral candidates. Cédric Priski has successfully coordinated national and international research projects that generated many publications in major conferences and peer-reviewed journals of the field Artificial Intelligence and knowledge representation as well as 4 PhD defences.