Building a Portuguese oenological dictionary

from corpus to terminology via co-occurrence networks

William Martinez, Sílvia Barbosa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Downloads (Pure)

Abstract

This paper focuses on the elaboration of a dictionary of terms in the Portuguese language which describe the wine-tasting experience. We present a corpus-based analysis aimed at designing an electronic dictionary: on the basis of a compilation of approximately 21,000 wine descriptions downloaded from a dozen Portuguese websites, we estimated both by frequency analysis and lexicographical study which terms were recurrent, relevant and representative of the “hard to put into words” occupation that is oenology. From the results thus obtained, a list was made of words that describe the sensory analysis in its three main aspects: visual, olfactive and gustatory. An exhaustive co-occurrence analysis then identified those terms which contribute most to structuring the text by way of their tendency to attract other words against statistical odds. When displayed in a co-occurrence network, these anchors emerge from the mesh as the foundational lexicon for wine tasting, and can be evaluated as prime candidates for a distributional thesaurus.
Original languageEnglish
Title of host publicationProceedings of the XVIII EURALEX International Congress
Subtitle of host publicationLexicography in Global Contexts
EditorsJaka Čibej, Vojko Gorjanc, Iztok Kosem, Simon Krek
Place of PublicationLjubljana
PublisherLjubljana University Press, Faculty of Arts
Pages351-361
Number of pages10
Edition
ISBN (Electronic)978-961-06-0097-8
Publication statusPublished - 17 Jul 2018
EventXVIII EURALEX International Congress: Lexicography in Global Contexts - he Centre for Language Resources and Technologies at the University of Ljubljana and Trojina, Institute for Applied Slovene Studies, Ljubljana, Slovenia
Duration: 17 Jul 201821 Jul 2018
http://euralex2018.cjvt.si/

Conference

ConferenceXVIII EURALEX International Congress: Lexicography in Global Contexts
CountrySlovenia
CityLjubljana
Period17/07/1821/07/18
Internet address

Fingerprint

wine
dictionary
technical language
frequency analysis
thesaurus
website
occupation
candidacy
electronics
language
experience

Keywords

  • Collocations
  • Co-occurrences
  • Word network
  • Corpus linguistics
  • Oenology
  • Terminology

Cite this

Martinez, W., & Barbosa, S. (2018). Building a Portuguese oenological dictionary: from corpus to terminology via co-occurrence networks. In J. Čibej, V. Gorjanc, I. Kosem, & S. Krek (Eds.), Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts (1ª ed., pp. 351-361). Ljubljana: Ljubljana University Press, Faculty of Arts.
Martinez, William ; Barbosa, Sílvia. / Building a Portuguese oenological dictionary : from corpus to terminology via co-occurrence networks. Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts. editor / Jaka Čibej ; Vojko Gorjanc ; Iztok Kosem ; Simon Krek. 1ª. ed. Ljubljana : Ljubljana University Press, Faculty of Arts, 2018. pp. 351-361
@inproceedings{997e84666f7a48b3b9d8d675f0bd8802,
title = "Building a Portuguese oenological dictionary: from corpus to terminology via co-occurrence networks",
abstract = "This paper focuses on the elaboration of a dictionary of terms in the Portuguese language which describe the wine-tasting experience. We present a corpus-based analysis aimed at designing an electronic dictionary: on the basis of a compilation of approximately 21,000 wine descriptions downloaded from a dozen Portuguese websites, we estimated both by frequency analysis and lexicographical study which terms were recurrent, relevant and representative of the “hard to put into words” occupation that is oenology. From the results thus obtained, a list was made of words that describe the sensory analysis in its three main aspects: visual, olfactive and gustatory. An exhaustive co-occurrence analysis then identified those terms which contribute most to structuring the text by way of their tendency to attract other words against statistical odds. When displayed in a co-occurrence network, these anchors emerge from the mesh as the foundational lexicon for wine tasting, and can be evaluated as prime candidates for a distributional thesaurus.",
keywords = "Collocations, Co-occurrences, Word network, Corpus linguistics, Oenology, Terminology",
author = "William Martinez and S{\'i}lvia Barbosa",
note = "info:eu-repo/grantAgreement/FCT/5876/147316/PT# UID/LIN/03213/2013",
year = "2018",
month = "7",
day = "17",
language = "English",
pages = "351--361",
editor = "Jaka Čibej and Vojko Gorjanc and Iztok Kosem and Simon Krek",
booktitle = "Proceedings of the XVIII EURALEX International Congress",
publisher = "Ljubljana University Press, Faculty of Arts",
edition = "1ª",

}

Martinez, W & Barbosa, S 2018, Building a Portuguese oenological dictionary: from corpus to terminology via co-occurrence networks. in J Čibej, V Gorjanc, I Kosem & S Krek (eds), Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts. 1ª edn, Ljubljana University Press, Faculty of Arts, Ljubljana, pp. 351-361, XVIII EURALEX International Congress: Lexicography in Global Contexts, Ljubljana, Slovenia, 17/07/18.

Building a Portuguese oenological dictionary : from corpus to terminology via co-occurrence networks. / Martinez, William; Barbosa, Sílvia.

Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts. ed. / Jaka Čibej; Vojko Gorjanc; Iztok Kosem; Simon Krek. 1ª. ed. Ljubljana : Ljubljana University Press, Faculty of Arts, 2018. p. 351-361.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Building a Portuguese oenological dictionary

T2 - from corpus to terminology via co-occurrence networks

AU - Martinez, William

AU - Barbosa, Sílvia

N1 - info:eu-repo/grantAgreement/FCT/5876/147316/PT# UID/LIN/03213/2013

PY - 2018/7/17

Y1 - 2018/7/17

N2 - This paper focuses on the elaboration of a dictionary of terms in the Portuguese language which describe the wine-tasting experience. We present a corpus-based analysis aimed at designing an electronic dictionary: on the basis of a compilation of approximately 21,000 wine descriptions downloaded from a dozen Portuguese websites, we estimated both by frequency analysis and lexicographical study which terms were recurrent, relevant and representative of the “hard to put into words” occupation that is oenology. From the results thus obtained, a list was made of words that describe the sensory analysis in its three main aspects: visual, olfactive and gustatory. An exhaustive co-occurrence analysis then identified those terms which contribute most to structuring the text by way of their tendency to attract other words against statistical odds. When displayed in a co-occurrence network, these anchors emerge from the mesh as the foundational lexicon for wine tasting, and can be evaluated as prime candidates for a distributional thesaurus.

AB - This paper focuses on the elaboration of a dictionary of terms in the Portuguese language which describe the wine-tasting experience. We present a corpus-based analysis aimed at designing an electronic dictionary: on the basis of a compilation of approximately 21,000 wine descriptions downloaded from a dozen Portuguese websites, we estimated both by frequency analysis and lexicographical study which terms were recurrent, relevant and representative of the “hard to put into words” occupation that is oenology. From the results thus obtained, a list was made of words that describe the sensory analysis in its three main aspects: visual, olfactive and gustatory. An exhaustive co-occurrence analysis then identified those terms which contribute most to structuring the text by way of their tendency to attract other words against statistical odds. When displayed in a co-occurrence network, these anchors emerge from the mesh as the foundational lexicon for wine tasting, and can be evaluated as prime candidates for a distributional thesaurus.

KW - Collocations

KW - Co-occurrences

KW - Word network

KW - Corpus linguistics

KW - Oenology

KW - Terminology

M3 - Conference contribution

SP - 351

EP - 361

BT - Proceedings of the XVIII EURALEX International Congress

A2 - Čibej, Jaka

A2 - Gorjanc, Vojko

A2 - Kosem, Iztok

A2 - Krek, Simon

PB - Ljubljana University Press, Faculty of Arts

CY - Ljubljana

ER -

Martinez W, Barbosa S. Building a Portuguese oenological dictionary: from corpus to terminology via co-occurrence networks. In Čibej J, Gorjanc V, Kosem I, Krek S, editors, Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts. 1ª ed. Ljubljana: Ljubljana University Press, Faculty of Arts. 2018. p. 351-361