TY - JOUR
T1 - Generation of classification trees from variable weighted features
AU - Barahona, Pedro
AU - Bel-Enguix, Gemma
AU - Dahl, Veronica
AU - Jiménez-López, M. Dolores
AU - Krippahl, Ludwig
N1 - Acknowledgments This paper has been supported by Project HP2008-0029 and by Portuguese national funds through FCT-Fun-dac¸ão para a Ciência e Tecnologia, under Project PTDC/EIA-CCO/ 115999/2009.
PY - 2014/1/1
Y1 - 2014/1/1
N2 - Trees are a useful framework for classifying entities whose attributes are, at least partially, related through a common ancestry, such as species of organisms, family members or languages. In some common applications, such as phylogenetic trees based on DNA sequences, relatedness can be inferred from the statistical analysis of unweighted attributes. The vast majority of mutations that survive across generations are evolutionarily neutral, which means that most genetic differences between species will have accumulated independently and randomly. In these cases, it is possible to calculate the tree from a precomputed matrix of distances. In other cases, such as with anatomical traits or languages, the assumption of random and independent differences does not hold, making it necessary to consider some traits to be more relevant than others for determining how related two entities are. In this paper, we present a constraint programming approach that can enforce consistency between bounds on the relative weight of each trait and tree topologies, so that the user can best determine which sets of traits to use and how the entities are likely to be related.
AB - Trees are a useful framework for classifying entities whose attributes are, at least partially, related through a common ancestry, such as species of organisms, family members or languages. In some common applications, such as phylogenetic trees based on DNA sequences, relatedness can be inferred from the statistical analysis of unweighted attributes. The vast majority of mutations that survive across generations are evolutionarily neutral, which means that most genetic differences between species will have accumulated independently and randomly. In these cases, it is possible to calculate the tree from a precomputed matrix of distances. In other cases, such as with anatomical traits or languages, the assumption of random and independent differences does not hold, making it necessary to consider some traits to be more relevant than others for determining how related two entities are. In this paper, we present a constraint programming approach that can enforce consistency between bounds on the relative weight of each trait and tree topologies, so that the user can best determine which sets of traits to use and how the entities are likely to be related.
KW - Classification trees
KW - Constraint programming
UR - http://www.scopus.com/inward/record.url?scp=84903596081&partnerID=8YFLogxK
U2 - 10.1007/s11047-013-9368-7
DO - 10.1007/s11047-013-9368-7
M3 - Article
AN - SCOPUS:84903596081
SN - 1567-7818
VL - 13
SP - 169
EP - 177
JO - Natural Computing
JF - Natural Computing
IS - 2
ER -