Extraction and transformation of data from semi-structured text files using a declarative approach

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

The World Wide Web is a major source of textual information, with a human-readable semi-structured format, referring to multiple domains, some of them highly complex. Traditional ETL approaches following the development of specific source code for each data source and based on multiple domain / computerscience experts interactions, become an inadequate solution, time consuming and prone to error. This paper presents a novel approach to ETL, based on its decomposition in two phases: ETD (Extraction, Transformation and Data Delivery) and IL (Integration and Loading). The ETD proposal is supported by a declarative language for expressing ETD statements and a graphical application for interacting with the domain expert. When applying ETD mainly domain expertise is required, while computer-science expertise will be centred in the IL phase, linking the processed data to target system models, enabling a clearer separation of concerns. This paper presents how ETD has been integrated, tested and validated in a space domain project, currently operational at the European Space Agency for the Galileo Mission.

Original languageEnglish
Title of host publicationICEIS 2007: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS: DATABASES AND INFORMATION SYSTEMS INTEGRATION
EditorsJ. Cardoso, J. Cardoso, J. Filipe
PublisherINSTICC-INST SYST TECHNOLOGIES INFORMATION CONTROL & COMMUNICATION
Pages199-205
Number of pages7
ISBN (Print)978-972-8865-88-7
Publication statusPublished - 1 Dec 2007
Event9th International Conference on Enterprise Information Systems, ICEIS 2007 - Funchal, Madeira, Portugal
Duration: 12 Jun 200716 Jun 2007

Conference

Conference9th International Conference on Enterprise Information Systems, ICEIS 2007
Country/TerritoryPortugal
CityFunchal, Madeira
Period12/06/0716/06/07

Keywords

  • Declarative language
  • ETD
  • ETL
  • IL
  • Semi-structured
  • Text files

Fingerprint

Dive into the research topics of 'Extraction and transformation of data from semi-structured text files using a declarative approach'. Together they form a unique fingerprint.

Cite this