TY - GEN
T1 - Identification of Bilingual Suffix Classes for Classification and Translation Generation
AU - Kavitha, Karimbi Mahesh
AU - Gomes, Luis
AU - Pereira Lopes, Jose Gabriel
PY - 2014
Y1 - 2014
N2 - We examine the possibility of learning bilingual morphology using the translation forms taken from an existing, manually validated, bilingual translation lexicon. The objective is to evaluate the use of bilingual stem and suffix based features on the performance of the existing Support Vector Machine based classifier trained to classify the automatically extracted word-to-word translations. We initially induce the bilingual stem and suffix correspondences by considering the longest sequence common to orthogonally similar translations. Clusters of stem-pairs characterised by identical suffix-pairs are formed, which are then used to generate out-of-vocabulary translations that are identical to, but different from, the previously existing translations, thereby completing the existing lexicon. Using the bilingual stem and suffix correspondences induced from the augmented lexicon we come up with 5 new features that reflects the (non) existence of morphological coverage (agreement) between a term and its translation. Specifically, we examine and evaluate the use of suffix classes, bilingual stem and suffix correspondences as features in selecting correct word-to-word translations from among the automatically extracted ones. With a training data of approximately 35.8K word translations for the language pair English-Portuguese, we identified around 6.4K unique stem pairs and 0.25K unique suffix pairs. Further, experimental results show that the newly added features improved the word-to-word classification accuracy by 9.11% leading to an overall improvement in the classifier accuracy by 2.15% when all translations (single- and multi-word translations) were considered.
AB - We examine the possibility of learning bilingual morphology using the translation forms taken from an existing, manually validated, bilingual translation lexicon. The objective is to evaluate the use of bilingual stem and suffix based features on the performance of the existing Support Vector Machine based classifier trained to classify the automatically extracted word-to-word translations. We initially induce the bilingual stem and suffix correspondences by considering the longest sequence common to orthogonally similar translations. Clusters of stem-pairs characterised by identical suffix-pairs are formed, which are then used to generate out-of-vocabulary translations that are identical to, but different from, the previously existing translations, thereby completing the existing lexicon. Using the bilingual stem and suffix correspondences induced from the augmented lexicon we come up with 5 new features that reflects the (non) existence of morphological coverage (agreement) between a term and its translation. Specifically, we examine and evaluate the use of suffix classes, bilingual stem and suffix correspondences as features in selecting correct word-to-word translations from among the automatically extracted ones. With a training data of approximately 35.8K word translations for the language pair English-Portuguese, we identified around 6.4K unique stem pairs and 0.25K unique suffix pairs. Further, experimental results show that the newly added features improved the word-to-word classification accuracy by 9.11% leading to an overall improvement in the classifier accuracy by 2.15% when all translations (single- and multi-word translations) were considered.
KW - Bilingual suffix classes
KW - Translation classification
KW - Support vector machine
KW - Lexicon augmentation
KW - OOV terms
KW - MORPHOLOGY
U2 - 10.1007/978-3-319-12027-0_13
DO - 10.1007/978-3-319-12027-0_13
M3 - Conference contribution
SN - 978-3-319-12026-3
T3 - Lecture Notes in Artificial Intelligence
SP - 154
EP - 166
BT - ADVANCES IN ARTIFICIAL INTELLIGENCE (IBERAMIA 2014)
A2 - Bazzan, ALC
A2 - Pichara, K
PB - SPRINGER-VERLAG BERLIN
T2 - 14th Ibero-American Conference on Artificial Intelligence (AI)
Y2 - 24 November 2014 through 27 November 2014
ER -