TY - JOUR
T1 - Designing the ELEXIS Parallel Sense-Annotated Dataset in 10 European Languages
AU - Martelli, Federico
AU - Navigli, Roberto
AU - Krek, Simon
AU - Kallas, Jelena
AU - Gantar, Polona
AU - Koeva, Svetla
AU - Nimb, Sanni
AU - Sandford Pedersen, Bolette
AU - Olsen, Sussi
AU - Langemets, Margit
AU - Koppel, Kristina
AU - Üksik, Tiiu
AU - Dobrovoljc, Kaja
AU - Ureña-Ruiz, Rafael-J.
AU - Sancho-Sánchez, José-Luis
AU - Lipp, Veronika
AU - Váradi, Tamás
AU - Győrffy, András
AU - László, Simon
AU - Quochi, Valeria
AU - Monachini, Monica
AU - Frontini, Francesca
AU - Tiberius, Carole
AU - Tempelaars, Rob
AU - Costa, Rute
AU - Salgado, Ana
AU - Čibej, Jaka
AU - Munda, Tina
N1 - UIDB/03213/2020
UIDP/03213/2020
PY - 2021
Y1 - 2021
N2 - Over the course of the last few years, lexicography has witnessed the burgeoning of increasingly reliable automatic approaches supporting the creation of lexicographic resources such as dictionaries, lexical knowledge bases and annotated datasets. In fact, recent achievements in the field of Natural Language Processing and particularly in Word Sense Disambiguation have widely demonstrated their effectiveness not only for the creation of lexicographic resources, but also for enabling a deeper analysis of lexical-semantic data both within and across languages. Nevertheless, we argue that the potential derived from the connections between the two fields is far from exhausted. In this work, we address a serious limitation affecting both lexicography and Word Sense Disambiguation, i.e. the lack of high-quality sense-annotated data and describe our efforts aimed at constructing a novel entirely manually annotated parallel dataset in 10 European languages. For the purposes of the present paper, we concentrate on the annotation of morpho-syntactic features. Finally, unlike many of the currently available sense-annotated datasets, we will annotate semantically by using senses derived from high-quality lexicographic repositories.
AB - Over the course of the last few years, lexicography has witnessed the burgeoning of increasingly reliable automatic approaches supporting the creation of lexicographic resources such as dictionaries, lexical knowledge bases and annotated datasets. In fact, recent achievements in the field of Natural Language Processing and particularly in Word Sense Disambiguation have widely demonstrated their effectiveness not only for the creation of lexicographic resources, but also for enabling a deeper analysis of lexical-semantic data both within and across languages. Nevertheless, we argue that the potential derived from the connections between the two fields is far from exhausted. In this work, we address a serious limitation affecting both lexicography and Word Sense Disambiguation, i.e. the lack of high-quality sense-annotated data and describe our efforts aimed at constructing a novel entirely manually annotated parallel dataset in 10 European languages. For the purposes of the present paper, we concentrate on the annotation of morpho-syntactic features. Finally, unlike many of the currently available sense-annotated datasets, we will annotate semantically by using senses derived from high-quality lexicographic repositories.
KW - Digital lexicography
KW - Natural Language Processing
KW - Computational Linguistics
KW - Corpus Linguistics
KW - Word Sense Disambiguation
UR - http://www.scopus.com/inward/record.url?scp=85137076090&partnerID=8YFLogxK
M3 - Conference article
SN - 2533-5626
SP - 377
EP - 395
JO - Proceedings of Electronic Lexicography in the 21st Century Conference
JF - Proceedings of Electronic Lexicography in the 21st Century Conference
IS - 2021
T2 - eLex 2021, 7th biennial conference on electronic lexicography
Y2 - 5 July 2021 through 7 July 2021
ER -