Abstract
All words in every natural language are ambiguous, specially when translation is at stake. In translation tasks, there is the need for finding out adequate translations for such words in the contexts where they occur. In this article, a bilingual strategy to cluster words according to their meanings is described. A publicly available parallel corpora sen- tence aligned is used. Word senses are discriminated by their translations and by the words occurring in a window, both in the source and target language parallel sentences. This strategy is language independent and uses a correlation algorithm for filtering out irrelevant features. Clus- ters obtained were evaluated in terms of F-measure (getting an average rating of 94%) and their homogeneity and completeness was determined using V-Measure (getting an average rating of 83%). Learned clusters are then used to train a support vector machine to tag ambiguous words with their translations in the contexts where they occur. This task was also evaluated in terms of F-measure and confronted with a baseline.
Original language | English |
---|---|
Title of host publication | Lecture Notes in Computer Science |
Pages | 283 to 295 |
Publication status | Published - 1 Jan 2014 |
Event | CICLing - Duration: 1 Jan 2014 → … |
Conference
Conference | CICLing |
---|---|
Period | 1/01/14 → … |