Domain labeling in the Morais dictionary: bringing structure to unstructured lexicographic data

This article provides a detailed analysis on the use of domain labels, i.e., special markers identifying a specialised field of knowledge, in successive editions of the Morais dictionary. Morais is a historical Portuguese language dictionary, commonly known by and disseminated under the name of António de Morais Silva. This monolingual dictionary has relevance for the Portuguese lexicographic tradition as it inaugurates modern Portuguese lexicography and serves as a model for all subsequent lexicographic production throughout the 19th and 20th centuries. The domain labels were retrieved from the abbreviation lists of its various editions.
This work is part of an ongoing Portuguese national linguistic project. It has two goals: 1) to encode the first three editions of the Morais dictionary to make them available online (as well as publishing them as lexical resources using two different standards for structured lexicographic datasets) and 2) to provide a description of the lexicographic components of these editions following a rigorous linguistic treatment. This project is not merely of a lexicographic nature, but it also explores the convergence between lexicography and other research domains, such as terminology, ontologies, linked data, and digital humanities.
This article analyzes the domain labeling system in Morais from an evolutionary and diachronic perspective, in line with previous works that highlight the theoretical assumptions and methodological aspects of the lexicographical tradition around domain labeling. To organize lexicographic content, it is helpful to establish a hierarchical structure in general language dictionaries to systematize the included terminological information.
Each table of abbreviations has two distinct columns: one with the abbreviation and the other with the complete domain designations. Given the importance of domain labels, we conducted a survey of all domain labels found. We identify and demonstrate the previous and newly added domains. After reviewing the flat domain list, we evaluated whether there was a discernible knowledge organizational approach that identified possible generic domains and subdomains. In the organization of domains, we propose three possible levels: superdomain, domain, and subdomain. The superdomain corresponds to the broadest taxonomic grouping followed by a domain, whereas the subdomain is part of a broader domain. To facilitate the analysis and to focus on interoperability issues, we generated a metalabel, a tag that identifies the English equivalent of the corresponding domain.
The lists of domains included in general dictionaries’ outside matter follow alphabetical ordering, without any concern for the relationships that can be established between those types of labels. This article describes both onomasiological and semasiological approaches to treating specialized lexicographic content. Following terminological principles and an onomasiological approach, we organize and conceptualize specialized knowledge using structured data formats, such as Text Encoding Initiative, also considering future alignments between different lexicographic resources.
The project will contribute towards a more significant presence of lexicographic digital content in Portuguese through open tools and standards
Period1 Jun 2023
Event title24th Biennial Dictionary Society of North America Conference
