TY - JOUR
T1 - PolyTB: a genomic variation map for Mycobacterium tuberculosis
AU - Coll, Francesc
AU - Preston, Mark
AU - Guerra-Assunção, José Afonso
AU - Hill-Cawthorn, Grant
AU - Harris, David
AU - Perdigão, João
AU - Viveiros, Miguel
AU - Portugal, Isabel
AU - Drobniewski, Francis
AU - Gagneux, Sebastien
AU - Glynn, Judith R.
AU - Pain, Arnab
AU - Parkhill, Julian
AU - McNerney, Ruth
AU - Martin, Nigel
AU - Clark, Taane G.
N1 - PMID:24637013
WOS:000335913700022
PY - 2014/5
Y1 - 2014/5
N2 - Tuberculosis (TB) caused by Mycobacterium tuberculosis (Mtb) is the second major cause of death from an infectious disease worldwide. Recent advances in DNA sequencing are leading to the ability to generate whole genome information in clinical isolates of M. tuberculosis complex (MTBC). The identification of informative genetic variants such as phylogenetic markers and those associated with drug resistance or virulence will help barcode Mtb in the context of epidemiological, diagnostic and clinical studies. Mtb genomic datasets are increasingly available as raw sequences, which are potentially difficult and computer intensive to process, and compare across studies. Here we have processed the raw sequence data (>1500 isolates, eight studies) to compile a catalogue of SNPs (n = 74,039, 63% non-synonymous, 51.1% in more than one isolate, i.e. non-private), small indels (n = 4810) and larger structural variants (n = 800). We have developed the PolyTB web-based tool (http://pathogenseq.lshtm.ac.uk/polytb) to visualise the resulting variation and important meta-data (e.g. in silico inferred strain-types, location) within geographical map and phylogenetic views. This resource will allow researchers to identify polymorphisms within candidate genes of interest, as well as examine the genomic diversity and distribution of strains. PolyTB source code is freely available to researchers wishing to develop similar tools for their pathogen of interest.
AB - Tuberculosis (TB) caused by Mycobacterium tuberculosis (Mtb) is the second major cause of death from an infectious disease worldwide. Recent advances in DNA sequencing are leading to the ability to generate whole genome information in clinical isolates of M. tuberculosis complex (MTBC). The identification of informative genetic variants such as phylogenetic markers and those associated with drug resistance or virulence will help barcode Mtb in the context of epidemiological, diagnostic and clinical studies. Mtb genomic datasets are increasingly available as raw sequences, which are potentially difficult and computer intensive to process, and compare across studies. Here we have processed the raw sequence data (>1500 isolates, eight studies) to compile a catalogue of SNPs (n = 74,039, 63% non-synonymous, 51.1% in more than one isolate, i.e. non-private), small indels (n = 4810) and larger structural variants (n = 800). We have developed the PolyTB web-based tool (http://pathogenseq.lshtm.ac.uk/polytb) to visualise the resulting variation and important meta-data (e.g. in silico inferred strain-types, location) within geographical map and phylogenetic views. This resource will allow researchers to identify polymorphisms within candidate genes of interest, as well as examine the genomic diversity and distribution of strains. PolyTB source code is freely available to researchers wishing to develop similar tools for their pathogen of interest.
KW - Database
KW - Genomics
KW - Molecular epidemiology
KW - Mycobacterium tuberculosis
KW - Software
KW - Whole-genome sequencing
UR - http://www.scopus.com/inward/record.url?scp=84900826220&partnerID=8YFLogxK
UR - https://www.sciencedirect.com/science/article/pii/S1472979214203428?via%3Dihub
U2 - 10.1016/j.tube.2014.02.005
DO - 10.1016/j.tube.2014.02.005
M3 - Article
C2 - 24637013
AN - SCOPUS:84900826220
VL - 94
SP - 346
EP - 354
JO - Tuberculosis (Edinburgh, Scotland)
JF - Tuberculosis (Edinburgh, Scotland)
SN - 1873-281X
IS - 3
ER -