TY - JOUR
T1 - Combining computational linguistics with sentence embedding to create a zero-shot NLIDB
AU - Perezhohin, Yuriy
AU - Peres, Fernando
AU - Castelli, Mauro
N1 - info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F04152%2F2020/PT#
https://doi.org/10.54499/UIDB/04152/2020#
Perezhohin, Y., Peres, F., & Castelli, M. (2024). Combining computational linguistics with sentence embedding to create a zero-shot NLIDB. Array, 24, 1-11. Article 100368. https://doi.org/10.1016/j.array.2024.100368 --- This work was supported by MyNorth AI Research. This work was partially supported by national funds through the FCT (Fundação para a Ciência e a Tecnologia) by the project UIDB/04152/2020 - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS.
PY - 2024/12
Y1 - 2024/12
N2 - Accessing relational databases using natural language is a challenging task, with existing methods often suffering from poor domain generalization and high computational costs. In this study, we propose a novel approach that eliminates the training phase while offering high adaptability across domains. Our method combines structured linguistic rules, a curated vocabulary, and pre-trained embedding models to accurately translate natural language queries into SQL. Experimental results on the SPIDER benchmark demonstrate the effectiveness of our approach, with execution accuracy rates of 72.03% on the training set and 70.83% on the development set, while maintaining domain flexibility. Furthermore, the proposed system outperformed two extensively trained models by up to 28.33% on the development set, demonstrating its efficiency. This research presents a significant advancement in zero-shot Natural Language Interfaces for Databases (NLIDBs), providing a resource-efficient alternative for generating accurate SQL queries from plain language inputs.
AB - Accessing relational databases using natural language is a challenging task, with existing methods often suffering from poor domain generalization and high computational costs. In this study, we propose a novel approach that eliminates the training phase while offering high adaptability across domains. Our method combines structured linguistic rules, a curated vocabulary, and pre-trained embedding models to accurately translate natural language queries into SQL. Experimental results on the SPIDER benchmark demonstrate the effectiveness of our approach, with execution accuracy rates of 72.03% on the training set and 70.83% on the development set, while maintaining domain flexibility. Furthermore, the proposed system outperformed two extensively trained models by up to 28.33% on the development set, demonstrating its efficiency. This research presents a significant advancement in zero-shot Natural Language Interfaces for Databases (NLIDBs), providing a resource-efficient alternative for generating accurate SQL queries from plain language inputs.
KW - Text to SQL
KW - Natural language processing
KW - Computational linguistics
KW - Sentence embeddings
UR - http://www.scopus.com/inward/record.url?scp=85207791714&partnerID=8YFLogxK
UR - https://www.webofscience.com/wos/woscc/full-record/WOS:001348337200001
U2 - 10.1016/j.array.2024.100368
DO - 10.1016/j.array.2024.100368
M3 - Article
SN - 2590-0056
VL - 24
SP - 1
EP - 11
JO - Array
JF - Array
M1 - 100368
ER -