Combining computational linguistics with sentence embedding to create a zero-shot NLIDB

Research output: Contribution to journalArticlepeer-review

2 Downloads (Pure)

Abstract

Accessing relational databases using natural language is a challenging task, with existing methods often suffering from poor domain generalization and high computational costs. In this study, we propose a novel approach that eliminates the training phase while offering high adaptability across domains. Our method combines structured linguistic rules, a curated vocabulary, and pre-trained embedding models to accurately translate natural language queries into SQL. Experimental results on the SPIDER benchmark demonstrate the effectiveness of our approach, with execution accuracy rates of 72.03% on the training set and 70.83% on the development set, while maintaining domain flexibility. Furthermore, the proposed system outperformed two extensively trained models by up to 28.33% on the development set, demonstrating its efficiency. This research presents a significant advancement in zero-shot Natural Language Interfaces for Databases (NLIDBs), providing a resource-efficient alternative for generating accurate SQL queries from plain language inputs.
Original languageEnglish
Article number100368
Pages (from-to)1-11
Number of pages11
JournalArray
Volume24
Early online date24 Oct 2024
DOIs
Publication statusPublished - Dec 2024

Keywords

  • Text to SQL
  • Natural language processing
  • Computational linguistics
  • Sentence embeddings

Cite this