TY - JOUR
T1 - Using biological knowledge for multiple sequence aligner decision making
AU - Rubio-Largo, Álvaro
AU - Vanneschi, Leonardo
AU - Castelli, Mauro
AU - Vega-Rodríguez, Miguel A.
PY - 2017/12/1
Y1 - 2017/12/1
N2 - Multiple Sequence Alignment (MSA) is the simultaneous alignment among three or more biological sequences (nucleotides or amino acids). In recent years, important efforts have been assigned to the development of MSA approaches. In this work, we propose a framework that extracts the biological characteristics of an input set of unaligned sequences and uses this knowledge to decide which is the most suitable aligner and parameter configuration. We refer to it as Multiple Aligner Framework (MAF). The selection of the tuple {Aligner, Configuration} is based on searching, in a pre-computed file, the best tuple for a dataset with similar biological characteristics. In order to create this file, we use multiobjective optimization. In fact, three well-known multiobjective evolutionary algorithms (NSGA-II, IBEA and MOEA/D) have been used. To validate the framework, we have used five popular benchmark suites: BAliBASE 3.0, PREFAB 4.0, SABmark 1.65, OX-Bench and CDD 3.14. After comparing with well-known aligners published in the literature, such as Kalign2, MUSCLE, MAFFT, T-Coffee, MSAProbs, ProbCons, Clustal Ω and MUMMALS, we conclude that the multiple aligner framework is, in average, the method with the best balance between alignment accuracy/conservation and required runtime.
AB - Multiple Sequence Alignment (MSA) is the simultaneous alignment among three or more biological sequences (nucleotides or amino acids). In recent years, important efforts have been assigned to the development of MSA approaches. In this work, we propose a framework that extracts the biological characteristics of an input set of unaligned sequences and uses this knowledge to decide which is the most suitable aligner and parameter configuration. We refer to it as Multiple Aligner Framework (MAF). The selection of the tuple {Aligner, Configuration} is based on searching, in a pre-computed file, the best tuple for a dataset with similar biological characteristics. In order to create this file, we use multiobjective optimization. In fact, three well-known multiobjective evolutionary algorithms (NSGA-II, IBEA and MOEA/D) have been used. To validate the framework, we have used five popular benchmark suites: BAliBASE 3.0, PREFAB 4.0, SABmark 1.65, OX-Bench and CDD 3.14. After comparing with well-known aligners published in the literature, such as Kalign2, MUSCLE, MAFFT, T-Coffee, MSAProbs, ProbCons, Clustal Ω and MUMMALS, we conclude that the multiple aligner framework is, in average, the method with the best balance between alignment accuracy/conservation and required runtime.
KW - Aligner decision making
KW - Biological knowledge
KW - Multiobjective optimization
KW - Multiple sequence alignment
UR - http://www.scopus.com/inward/record.url?scp=85028043379&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2017.08.069
DO - 10.1016/j.ins.2017.08.069
M3 - Article
AN - SCOPUS:85028043379
VL - 420
SP - 278
EP - 298
JO - Information Sciences
JF - Information Sciences
SN - 0020-0255
ER -