Multilingual bi-encoder models for biomedical entity linking

Zekeriya Anil Guven, André Lamúrias

Research output: Contribution to journalArticlepeer-review

3 Downloads (Pure)

Abstract

Natural language processing (NLP) is a field of study that focuses on data analysis on texts with certain methods. NLP includes tasks such as sentiment analysis, spam detection, entity linking, and question answering, to name a few. Entity linking is an NLP task that is used to map mentions specified in the text to the entities of a Knowledge Base. In this study, we analysed the efficacy of bi-encoder entity linking models for multilingual biomedical texts. Using surface-based, approximate nearest neighbour search and embedding approaches during the candidate generation phase, accuracy, and recall values were measured on language representation models such as BERT, SapBERT, BioBERT, and RoBERTa according to language and domain. The proposed entity linking framework was analysed on the BC5CDR and Cantemist datasets for English and Spanish, respectively. The framework achieved 76.75% accuracy for the BC5CDR and 60.19% for the Cantemist. In addition, the proposed framework was compared with previous studies. The results highlight the challenges that come with domain-specific multilingual datasets.
Original languageEnglish
Article numbere13388
Number of pages14
JournalExpert Systems
Volume40
Issue number9
DOIs
Publication statusPublished - Nov 2023

Keywords

  • biomedical entity linking
  • data analysis
  • entity linking
  • language model
  • multilingual analysis
  • natural language processing

Fingerprint

Dive into the research topics of 'Multilingual bi-encoder models for biomedical entity linking'. Together they form a unique fingerprint.

Cite this