Cross-lingual Word Sense Clustering for Sense Disambiguation

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


Translation is one of the areas where word disambiguation
must be solved in order to find out adequate translations for such words
in the contexts where they occur. In this paper, a Word Sense Disam-
biguation (WSD) approach using Word Sense Clustering within a cross-
lingual strategy is proposed. Available sentence-aligned parallel corpora
are used as a reliable knowledge source. English is taken as the source
language, and Portuguese, French or Spanish as the targets. Clusters
are built based on the correlation between senses, which is measured by
a language-independent algorithm that uses as features the words near
the ambiguous word and its translation in the parallel sentences, together
with their relative positions. Clustering quality reached 81% (V-measure)
and 92% (F-measure) in average for the three language pairs. Learned
clusters are then used to train a support vector machine, whose clas-
sification results are used for sense disambiguation. Classification tests
showed an average (for the three languages) F-measure of 81%.
Original languageEnglish
Title of host publicationEPIA 2015
Subtitle of host publicationProgress in Artificial Intelligence
EditorsFrancisco Pereira, Penousal Machado, Ernesto Costa, Amílcar Cardoso
Place of PublicationCham
PublisherSpringer International Publishing
Number of pages12
ISBN (Electronic)978-3-319-23485-4
ISBN (Print)978-3-319-23484-7
Publication statusPublished - 25 Aug 2015
Event17th Portuguese Conference on Artificial Intelligence, EPIA 2015 - Coimbra, Portugal
Duration: 8 Sep 201511 Sep 2015

Publication series

NameLecture Notes in Computer Science (LNCS)
PublisherSpringer International Publishing
ISSN (Print)0302-9743


Conference17th Portuguese Conference on Artificial Intelligence, EPIA 2015


  • Word Sense Disambiguation
  • Clustering
  • Parallel corpora
  • V-measure
  • Support vector machine


Dive into the research topics of 'Cross-lingual Word Sense Clustering for Sense Disambiguation'. Together they form a unique fingerprint.

Cite this