Improving LocalMaxs Multiword Expression Statistical Extractor

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

LocalMaxs algorithm extracts relevant Multiword Expressions from text corpora based on a statistical approach. However, statistical extractors face an increased challenge of obtaining good practical results, compared to linguistic approaches which benefit from language-specific, syntactic and/or semantic, knowledge. First, this paper contributes to an improvement to the LocalMaxs algorithm, based on a more selective evaluation of the cohesion of each Multiword Expressions candidate with respect to its neighbourhood, and a filtering criterion guided by the location of stopwords within each candidate. Secondly, a new language-independent method is presented for the automatic self-identification of stopwords in corpora, requiring no external stopwords lists or linguistic tools. The obtained results for LocalMaxs reach Precision values of about 80% for English, French, German and Portuguese, showing an increase of around 12-13% compared to the previous LocalMaxs version. The performance of the self-identification of stopwords reaches high Precision for top-ranked stopword candidates.
Original languageEnglish
Title of host publicationComputational Science – ICCS 2023
Subtitle of host publication23rd International Conference, Prague, Czech Republic, July 3–5, 2023, Proceedings, Part II
EditorsJiří Mikyška, Clélia de Mulatier, Maciej Paszynski, Valeria V. Krzhizhanovskaya, Jack J. Dongarra, Peter M. A. Sloot
Place of PublicationCham
PublisherSpringer
Pages154-162
Number of pages9
ISBN (Electronic)978-3-031-36021-3
ISBN (Print)978-3-031-36020-6
DOIs
Publication statusPublished - 2023
Event23rd International Conference on Computational Science, ICCS 2023 - Prague, Czech Republic
Duration: 3 Jul 20235 Jul 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer
Volume14074 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference23rd International Conference on Computational Science, ICCS 2023
Country/TerritoryCzech Republic
CityPrague
Period3/07/235/07/23

Keywords

  • LocalMaxs algorithm
  • Multiword Expressions
  • Statistical Extractor
  • Stopwords

Fingerprint

Dive into the research topics of 'Improving LocalMaxs Multiword Expression Statistical Extractor'. Together they form a unique fingerprint.

Cite this