Building a Corpus of Errors and Quality in Machine Translation: Experiments on Error Impact

Ângela Costa, Rui Correia, Luísa Coheur

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we describe a corpus of automatic translations annotated with both error type and quality. The 300 sentences that we have selected were generated by Google Translate, Systran and two in-house Machine Translation systems that use Moses technology. The errors present on the translations were annotated with an error taxonomy that divides errors in five main linguistic categories (Orthography, Lexis, Grammar, Semantics and Discourse), reflecting the language level where the error is located. After the error annotation process, we accessed the translation quality of each sentence using a four point comprehension scale from 1 to 5. Both tasks of error and quality annotation were performed by two different annotators, achieving good levels of inter-annotator agreement. The creation of this corpus allowed us to use it as training data for a translation quality classifier. We concluded on error severity by observing the outputs of two machine learning classifiers: a decision tree and a regression model.
Original languageEnglish
Title of host publicationProceedings of the Tenth International Conference on Language Resources and Evaluation
EditorsNicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis
Place of PublicationPortoroz
PublisherELRA
Pages288-292
Number of pages5
ISBN (Print)978-2-9517408-9-1
Publication statusPublished - 2016
EventLREC 2016 - 10th International Conference on Language Resources and Evaluation - Grand Hotel Bernardin Conference Center, Portoroz, Slovenia
Duration: 23 May 201628 May 2016
http://www.lrec-conf.org/proceedings/lrec2016/index.html

Conference

ConferenceLREC 2016 - 10th International Conference on Language Resources and Evaluation
CountrySlovenia
CityPortoroz
Period23/05/1628/05/16
Internet address

Keywords

  • Machine Translation
  • Translation Errors
  • Translation Quality

Fingerprint Dive into the research topics of 'Building a Corpus of Errors and Quality in Machine Translation: Experiments on Error Impact'. Together they form a unique fingerprint.

Cite this