Comparing incomplete sequences via longest common subsequence

Mauro Castelli, Riccardo Dondi, Giancarlo Mauri, Italo Zoppis

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Inspired by scaffold filling, a recent approach for genome reconstruction from incomplete data, we consider a variant of the well-known longest common subsequence problem for the comparison of two sequences. The new problem, called Longest Filled Common Subsequence, aims to compare a complete sequence with an incomplete one, i.e. with some missing elements. Longest Filled Common Subsequence (LFCS), given a complete sequence A, an incomplete sequence B, and a multiset M of symbols missing in B, asks for a sequence B obtained by inserting the symbols of M into B so that B induces a common subsequence with A of maximum length. We investigate the computational and approximation complexity of the problem and we show that it is NP-hard and APX-hard when A contains at most two occurrences of each symbol, and we give a polynomial time algorithm when the input sequences are over a constant-size alphabet. We give a [Formula presented] approximation algorithm for the Longest Filled Common Subsequence problem. Finally, we present a fixed-parameter algorithm for the problem, when it is parameterized by the number of symbols inserted in B that “match” symbols of A.

Original languageEnglish
Pages (from-to)272-285
Number of pages14
JournalTheoretical Computer Science
Volume796
DOIs
Publication statusPublished - Dec 2019

Keywords

  • Approximation algorithms
  • Computational complexity
  • Fixed-parameter algorithms
  • Longest common subsequence
  • String algorithms

Fingerprint Dive into the research topics of 'Comparing incomplete sequences via longest common subsequence'. Together they form a unique fingerprint.

Cite this