Towards a Statistical-Enriched Corpus Containing Portuguese Collocations in Use: Reviewing Possible Extraction Tools

Ângela Costa, Luisa Coheur

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Downloads (Pure)

Abstract

Collocations are a main problem for any natural language processing task, from machine translation to summarization. With the goal of building a corpus with collocations, enriched with statistical information about them, we survey, in this paper, four tools for extracting collocations. These tools allow us to collect sentences with collocations, and also to gather statistics on this particular type of co-ocurrences, like Mutual Information and Log likelihood values.
Original languageEnglish
Title of host publicationComputational Processing of the Portuguese Language
Subtitle of host publication12th International Conference, PROPOR 2016, Tomar, Portugal, July 13-15, 2016, Proceedings
EditorsJoão Silva, Ricardo Ribeiro, Paulo Quaresma, André Adami, António Branco
Place of PublicationCham
PublisherSpringer
Pages319-329
Number of pages10
Volume9727
ISBN (Print)978-3-319-41551-2
DOIs
Publication statusPublished - 2016
Event12th International Conference on Computational Processing of the Portuguese Language, PROPOR 2016 - Tomar, Portugal
Duration: 13 Jul 201615 Jul 2016

Publication series

NameLecture Notes in Artificial Intelligence

Conference

Conference12th International Conference on Computational Processing of the Portuguese Language, PROPOR 2016
CountryPortugal
CityTomar
Period13/07/1615/07/16

Keywords

  • Collocations
  • Wortschatz
  • DeepDict
  • CRPC
  • Sketch engine

Cite this