Reducing the Number of Training Cases in Genetic Programming

Giacomo Zoppi, Leonardo Vanneschi, Mario Giacobini

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Downloads (Pure)


In the field of Machine Learning, one of the most common and discussed questions is how to choose an adequate number of data observations, in order to train our models satisfactorily. In other words, find what is the right amount of data needed to create a model, that is neither underfitted nor overfitted, but instead is able to achieve a reasonable generalization ability. The problem grows in importance when we consider Genetic Programming, where fitness evaluation is often rather slow. Therefore, finding the minimum amount of data that enables us to discover the solution to a given problem could bring significant benefits. Using the notion of entropy in a dataset, we seek to understand the information gain obtainable from each additional data point. We then look for the smallest percentage of data that corresponds to enough information to yield satisfactory results. We present, as a first step, an example derived from the state of art. Then, we question a relevant part of our procedure and introduce two case studies to experimentally validate our theoretical hypothesis.
Original languageEnglish
Title of host publication2022 IEEE Congress on Evolutionary Computation
Subtitle of host publicationCEC 2022 - Conference Proceedings
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages8
ISBN (Print)978-1-6654-6708-7
Publication statusPublished - 18 Jul 2022
Event2022 IEEE Congress on Evolutionary Computation (CEC): part of IEEE World Congress on Computational Intelligence (IEEE WCCI 2022) -, Padua, Italy
Duration: 18 Jul 202223 Jul 2022
Conference number: 2022


Conference2022 IEEE Congress on Evolutionary Computation (CEC)
Abbreviated titleCEC 2022


  • Training
  • Boolean functions
  • Genetic programming
  • Machine learning
  • Evolutionary computation
  • Data models
  • Benchmark testing


Dive into the research topics of 'Reducing the Number of Training Cases in Genetic Programming'. Together they form a unique fingerprint.

Cite this