Activities per year
Abstract
The anchorage to real data is one of the main parameters that guarantees the quality and the coverage of lexical resources, especially in the context of specialized domains. Thus, lexicon extraction from corpora is a consensual method for building lexical resources. However, given that data validation by experts in specialized contexts is a necessary step, the automatic screening of data becomes fundamental to maximize the informational value of the interaction with experts. In this paper we present and discuss a hybrid methodology, combining linguistic and statistical approaches, focusing on the extraction of specialized lexical units and salient semantic information using CQL grammars. The proposed method involves several steps, from frequency information analyses, concordances and collocations extraction to manual revision and expert validation and encompasses the construction and application of knowledge-based patterns CQL grammars. We present two CQL grammars for lexical and semantic information extraction developed for Portuguese and Italian and evaluate results from its application to specialized corpora on Public Art domain, demonstrating the value of this method for lexicon and semantic information extraction from large data.
Original language | English |
---|---|
Title of host publication | Computational Processing of the Portuguese Language |
Subtitle of host publication | 15th International Conference, PROPOR 2022, Fortaleza, Brazil, March 21–23, 2022, Proceedings |
Editors | Vládia Pinheiro, Pablo Gamallo, Raquel Amaro, Carolina Scarton, Fernando Batista, Diego Silva, Catarina Magro, Hugo Pinto |
Place of Publication | Cham |
Publisher | Springer |
Pages | 376-386 |
Number of pages | 10 |
Volume | 13208 |
Edition | 1st |
ISBN (Electronic) | 978-3-030-98305-5 |
ISBN (Print) | 978-3-030-98304-8 |
DOIs | |
Publication status | Published - 2022 |
Event | 15th International Conference on Computational Processing of Portuguese: PROPOR2022 - University of Fortaleza, Ceará, Fortaleza, Brazil Duration: 21 Mar 2022 → 23 Mar 2022 Conference number: 15 https://sites.universidadedefortaleza.com/propor2022/ |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Volume | 13208 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 15th International Conference on Computational Processing of Portuguese |
---|---|
Abbreviated title | PROPOR2022 |
Country/Territory | Brazil |
City | Fortaleza |
Period | 21/03/22 → 23/03/22 |
Internet address |
Fingerprint
Dive into the research topics of 'CQL Grammars for Lexical and Semantic Information Extraction for Portuguese and Italian'. Together they form a unique fingerprint.Activities
- 1 Oral presentation
-
CQL Grammars for Lexical and Semantic Information Extraction for Portuguese and Italian
Chiara Barbero (Speaker)
23 Mar 2022Activity: Talk or presentation › Oral presentation