Abstract
In this paper we describe a corpus of automatic translations annotated with both error type and quality. The 300 sentences that we have selected were generated by Google Translate, Systran and two in-house Machine Translation systems that use Moses technology. The errors present on the translations were annotated with an error taxonomy that divides errors in five main linguistic categories (Orthography, Lexis, Grammar, Semantics and Discourse), reflecting the language level where the error is located. After the error annotation process, we accessed the translation quality of each sentence using a four point comprehension scale from 1 to 5. Both tasks of error and quality annotation were performed by two different annotators, achieving good levels of inter-annotator agreement. The creation of this corpus allowed us to use it as training data for a translation quality classifier. We concluded on error severity by observing the outputs of two machine learning classifiers: a decision tree and a regression model.
Original language | English |
---|---|
Title of host publication | Proceedings of the Tenth International Conference on Language Resources and Evaluation |
Editors | Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis |
Place of Publication | Portoroz |
Publisher | European Language Resources Association (ELRA) |
Pages | 288-292 |
Number of pages | 5 |
ISBN (Print) | 978-2-9517408-9-1 |
Publication status | Published - 2016 |
Event | LREC 2016 - 10th International Conference on Language Resources and Evaluation - Grand Hotel Bernardin Conference Center, Portoroz, Slovenia Duration: 23 May 2016 → 28 May 2016 http://www.lrec-conf.org/proceedings/lrec2016/index.html |
Conference
Conference | LREC 2016 - 10th International Conference on Language Resources and Evaluation |
---|---|
Country/Territory | Slovenia |
City | Portoroz |
Period | 23/05/16 → 28/05/16 |
Internet address |
Keywords
- Machine Translation
- Translation Errors
- Translation Quality