Using crossover based similarity measure to improve genetic programming generalization ability

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Citations (Scopus)


Generalization is a very important issue in Machine Learning. In this paper, we present a new idea for improving Genetic Programming generalization ability. The idea is based on a dynamic twolayered selection algorithm and it is tested on a real-life drug discovery regression application. The algorithm begins using root mean squared error as fitness and the usual tournament selection. A list of individuals called "repulsors" is also kept in memory and initialized as empty. As an individual is found to overfit the training set, it is inserted into the list of repulsors. When the list of repulsors is not empty, selection becomes a two-layer algorithm: individuals participating to the tournament are not randomly chosen from the population but are themselves selected, using the average dissimilarity to the repulsors as a criterion to be maximized. Two kinds of similarity/dissimilarity measures are tested for this aim: the well known structural (or edit) distance and the recently defined subtree crossover based similarity measure. Although simple, this idea seems to improve Genetic Programming generalization ability and the presented experimental results show that Genetic Programming generalizes better when subtree crossover based similarity measure is used, at least for the test problems studied in this paper.
Original languageUnknown
Title of host publicationProceedings of the 11th Annual conference on Genetic and evolutionary computation
EditorsRaidl Gea(
PublisherACM - Association for Computing Machinery
Publication statusPublished - 1 Jan 2009
EventGECCO ’09 -
Duration: 1 Jan 2009 → …


ConferenceGECCO ’09
Period1/01/09 → …

Cite this