TY - GEN
T1 - Towards an evolutionary-based approach for natural language processing
AU - Manzoni, Luca
AU - Jakobovic, Domagoj
AU - Mariot, Luca
AU - Picek, Stjepan
AU - Castelli, Mauro
N1 - info:eu-repo/grantAgreement/FCT/3599-PPCDT/DSAIPA%2FDS%2F0113%2F2019/PT#
info:eu-repo/grantAgreement/FCT/3599-PPCDT/DSAIPA%2FDS%2F0022%2F2018/PT#
Manzoni, L., Jakobovic, D., Mariot, L., Picek, S., & Castelli, M. (2020). Towards an evolutionary-based approach for natural language processing. In GECCO 2020: Proceedings of the 2020 Genetic and Evolutionary Computation Conference (pp. 985-993). (GECCO 2020 - Proceedings of the 2020 Genetic and Evolutionary Computation Conference). Association for Computing Machinery. https://doi.org/10.1145/3377930.3390248
PY - 2020/6/25
Y1 - 2020/6/25
N2 - Tasks related to Natural Language Processing (NLP) have recently been the focus of a large research endeavor by the machine learning community. The increased interest in this area is mainly due to the success of deep learning methods. Genetic Programming (GP), however, was not under the spotlight with respect to NLP tasks. Here, we propose a first proof-of-concept that combines GP with the well established NLP tool word2vec for the next word prediction task. The main idea is that, once words have been moved into a vector space, traditional GP operators can successfully work on vectors, thus producing meaningful words as the output. To assess the suitability of this approach, we perform an experimental evaluation on a set of existing newspaper headlines. Individuals resulting from this (pre-)training phase can be employed as the initial population in other NLP tasks, like sentence generation, which will be the focus of future investigations, possibly employing adversarial co-evolutionary approaches.
AB - Tasks related to Natural Language Processing (NLP) have recently been the focus of a large research endeavor by the machine learning community. The increased interest in this area is mainly due to the success of deep learning methods. Genetic Programming (GP), however, was not under the spotlight with respect to NLP tasks. Here, we propose a first proof-of-concept that combines GP with the well established NLP tool word2vec for the next word prediction task. The main idea is that, once words have been moved into a vector space, traditional GP operators can successfully work on vectors, thus producing meaningful words as the output. To assess the suitability of this approach, we perform an experimental evaluation on a set of existing newspaper headlines. Individuals resulting from this (pre-)training phase can be employed as the initial population in other NLP tasks, like sentence generation, which will be the focus of future investigations, possibly employing adversarial co-evolutionary approaches.
KW - Genetic programming
KW - Natural language processing
KW - Next word prediction
UR - http://www.scopus.com/inward/record.url?scp=85091765216&partnerID=8YFLogxK
UR - http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=Alerting&SrcApp=Alerting&DestApp=WOS_CPL&DestLinkType=FullRecord&UT=WOS:000605292300114
U2 - 10.1145/3377930.3390248
DO - 10.1145/3377930.3390248
M3 - Conference contribution
AN - SCOPUS:85091765216
T3 - GECCO 2020 - Proceedings of the 2020 Genetic and Evolutionary Computation Conference
SP - 985
EP - 993
BT - GECCO 2020
PB - Association for Computing Machinery
T2 - 2020 Genetic and Evolutionary Computation Conference, GECCO 2020
Y2 - 8 July 2020 through 12 July 2020
ER -