Compact and Fast Indexes for Translation Related Tasks

Jorge Costa, Luis Gomes, José Gabriel Pereira Lopes, Luis M. S. Russo, Nieves R. Brisaboa

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Translation tasks, including bilingual concordancing, demand an efficient space/time trade-off, which is not always easy to get due to the usage of huge text collections and the space consuming nature of time efficient text indexes. We propose a compact representation for monotonically aligned parallel texts, based on known compressed text indexes for representing the texts and additional uncompressed structures for the alignment. The proposed framework is able to index a collection of texts in main memory, occupying less space than the text size and with efficient query response time. The proposal supports any type of alignment granularity, a novelty in concordancing applications, allowing a flexible environment for linguistics working in all phases of a translation process. We present two alternatives for self-indexing the texts, and two alternatives for supporting the alignment, comparing the alternatives in terms of space/time performance.

Original languageEnglish
Title of host publicationPROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2013
EditorsL Correia, LP Reis, J Cascalho
PublisherSPRINGER-VERLAG BERLIN
Pages504-515
Number of pages12
ISBN (Print)978-3-642-40668-3
Publication statusPublished - 2013
Event16th Portuguese Conference on Artificial Intelligence, EPIA 2013 - Angra do Heroismo, Azores, Portugal
Duration: 9 Sep 201312 Sep 2013

Publication series

NameLecture Notes in Artificial Intelligence
PublisherSPRINGER-VERLAG BERLIN
Volume8154
ISSN (Print)0302-9743

Conference

Conference16th Portuguese Conference on Artificial Intelligence, EPIA 2013
CountryPortugal
CityAngra do Heroismo, Azores
Period9/09/1312/09/13

Keywords

  • Text compression
  • Machine Translation
  • bilingual concordancer
  • parallel text alignment
  • alignment granularity
  • SUFFIX ARRAYS

Cite this