Evaluating Cross-lingual Semantic Annotation for Medical Forms



Y.C. Lin, V. Christen, A. Gross, T. Kirsten, S.D. Cardoso, C. Pruski, M. Da Silveira, and E. Rahm


in the Proceedings of the 13th international joint conference on biomedical engineering systems and technologies, vol 5: healthinf, pp. 145-155, ISBN:978-989-758-398-8, 2020


Annotating documents or datasets using concepts of biomedical ontologies has become increasingly important. Such ontology-based semantic annotations can improve the interoperability and the quality of data integration in health care practice and biomedical research. However, due to the restrictive coverage of non-English ontologies and the lack of comparably good annotators as for English language, annotating non-English documents is even more challenging. In this paper we aim to annotate medical forms in German language. We present a parallel corpus where all medical forms are in both German and English languages. We use three annotators to automatically generate annotations and these annotations are manually verified to construct an English Silver Standard Corpus (SSC). Based on the parallel corpus of German and English documents and the SSC, we evaluate the quality of different annotation approaches, mainly 1) direct annotation using German corpus and German ontologies and 2) integrating machine translators to translate German corpus and annotate the translated corpus with English ontologies. The results show that using German ontologies only produces very restricted results, whereas translation achieves better annotation quality and is able to retain almost 70% of the annotations.



Partager cette page :