Generalization is a very important issue in Machine Learning. In this paper, we present a new idea for improving Genetic Programming generalization ability. The idea is based on a dynamic twolayered selection algorithm and it is tested on a real-life drug discovery regression application. The algorithm begins using root mean squared error as fitness and the usual tournament selection. A list of individuals called "repulsors" is also kept in memory and initialized as empty. As an individual is found to overfit the training set, it is inserted into the list of repulsors. When the list of repulsors is not empty, selection becomes a two-layer algorithm: individuals participating to the tournament are not randomly chosen from the population but are themselves selected, using the average dissimilarity to the repulsors as a criterion to be maximized. Two kinds of similarity/dissimilarity measures are tested for this aim: the well known structural (or edit) distance and the recently defined subtree crossover based similarity measure. Although simple, this idea seems to improve Genetic Programming generalization ability and the presented experimental results show that Genetic Programming generalizes better when subtree crossover based similarity measure is used, at least for the test problems studied in this paper.
|Title of host publication||Proceedings of the 11th Annual conference on Genetic and evolutionary computation|
|Publication status||Published - 1 Jan 2009|
|Event||GECCO ’09 - |
Duration: 1 Jan 2009 → …
|Period||1/01/09 → …|