Abstract
In the field of Machine Learning, one of the most common and discussed questions is how to choose an adequate number of data observations, in order to train our models satisfactorily. In other words, find what is the right amount of data needed to create a model, that is neither underfitted nor overfitted, but instead is able to achieve a reasonable generalization ability. The problem grows in importance when we consider Genetic Programming, where fitness evaluation is often rather slow. Therefore, finding the minimum amount of data that enables us to discover the solution to a given problem could bring significant benefits. Using the notion of entropy in a dataset, we seek to understand the information gain obtainable from each additional data point. We then look for the smallest percentage of data that corresponds to enough information to yield satisfactory results. We present, as a first step, an example derived from the state of art. Then, we question a relevant part of our procedure and introduce two case studies to experimentally validate our theoretical hypothesis.
Original language | English |
---|---|
Title of host publication | 2022 IEEE Congress on Evolutionary Computation |
Subtitle of host publication | CEC 2022 - Conference Proceedings |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Pages | 1-8 |
Number of pages | 8 |
ISBN (Print) | 978-1-6654-6708-7 |
DOIs | |
Publication status | Published - 18 Jul 2022 |
Event | 2022 IEEE Congress on Evolutionary Computation (CEC): part of IEEE World Congress on Computational Intelligence (IEEE WCCI 2022) - https://wcci2022.org/, Padua, Italy Duration: 18 Jul 2022 → 23 Jul 2022 Conference number: 2022 |
Conference
Conference | 2022 IEEE Congress on Evolutionary Computation (CEC) |
---|---|
Abbreviated title | CEC 2022 |
Country/Territory | Italy |
City | Padua |
Period | 18/07/22 → 23/07/22 |
Keywords
- Training
- Boolean functions
- Genetic programming
- Machine learning
- Evolutionary computation
- Data models
- Benchmark testing