CQL Grammars for Lexical and Semantic Information Extraction for Portuguese and Italian

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The anchorage to real data is one of the main parameters that guarantees the quality and the coverage of lexical resources, especially in the context of specialized domains. Thus, lexicon extraction from corpora is a consensual method for building lexical resources. However, given that data validation by experts in specialized contexts is a necessary step, the automatic screening of data becomes fundamental to maximize the informational value of the interaction with experts. In this paper we present and discuss a hybrid methodology, combining linguistic and statistical approaches, focusing on the extraction of specialized lexical units and salient semantic information using CQL grammars. The proposed method involves several steps, from frequency information analyses, concordances and collocations extraction to manual revision and expert validation and encompasses the construction and application of knowledge-based patterns CQL grammars. We present two CQL grammars for lexical and semantic information extraction developed for Portuguese and Italian and evaluate results from its application to specialized corpora on Public Art domain, demonstrating the value of this method for lexicon and semantic information extraction from large data.
Original languageEnglish
Title of host publicationComputational Processing of the Portuguese Language
Subtitle of host publication15th International Conference, PROPOR 2022, Fortaleza, Brazil, March 21–23, 2022, Proceedings
EditorsVládia Pinheiro, Pablo Gamallo, Raquel Amaro, Carolina Scarton, Fernando Batista, Diego Silva, Catarina Magro, Hugo Pinto
Place of PublicationCham
PublisherSpringer
Pages376-386
Number of pages10
Volume13208
Edition1st
ISBN (Electronic)978-3-030-98305-5
ISBN (Print)978-3-030-98304-8
DOIs
Publication statusPublished - 2022
Event15th International Conference on Computational Processing of Portuguese: PROPOR2022 - University of Fortaleza, Ceará, Fortaleza, Brazil
Duration: 21 Mar 202223 Mar 2022
Conference number: 15
https://sites.universidadedefortaleza.com/propor2022/

Publication series

NameLecture Notes in Computer Science
Volume13208
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference15th International Conference on Computational Processing of Portuguese
Abbreviated titlePROPOR2022
Country/TerritoryBrazil
CityFortaleza
Period21/03/2223/03/22
Internet address

Fingerprint

Dive into the research topics of 'CQL Grammars for Lexical and Semantic Information Extraction for Portuguese and Italian'. Together they form a unique fingerprint.

Cite this