Using biological knowledge for multiple sequence aligner decision making

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Multiple Sequence Alignment (MSA) is the simultaneous alignment among three or more biological sequences (nucleotides or amino acids). In recent years, important efforts have been assigned to the development of MSA approaches. In this work, we propose a framework that extracts the biological characteristics of an input set of unaligned sequences and uses this knowledge to decide which is the most suitable aligner and parameter configuration. We refer to it as Multiple Aligner Framework (MAF). The selection of the tuple {Aligner, Configuration} is based on searching, in a pre-computed file, the best tuple for a dataset with similar biological characteristics. In order to create this file, we use multiobjective optimization. In fact, three well-known multiobjective evolutionary algorithms (NSGA-II, IBEA and MOEA/D) have been used. To validate the framework, we have used five popular benchmark suites: BAliBASE 3.0, PREFAB 4.0, SABmark 1.65, OX-Bench and CDD 3.14. After comparing with well-known aligners published in the literature, such as Kalign2, MUSCLE, MAFFT, T-Coffee, MSAProbs, ProbCons, Clustal Ω and MUMMALS, we conclude that the multiple aligner framework is, in average, the method with the best balance between alignment accuracy/conservation and required runtime.

Original languageEnglish
Pages (from-to)278-298
Number of pages21
JournalInformation Sciences
Volume420
DOIs
Publication statusPublished - 1 Dec 2017

Keywords

  • Aligner decision making
  • Biological knowledge
  • Multiobjective optimization
  • Multiple sequence alignment

Fingerprint Dive into the research topics of 'Using biological knowledge for multiple sequence aligner decision making'. Together they form a unique fingerprint.

  • Cite this