TY - JOUR
T1 - The Hypervariable Tpr Multigene Family of Theileria Parasites, Defined by a Conserved, Membrane-Associated, C-Terminal Domain, Includes Several Copies with Defined Orthology Between Species
AU - Palmateer, Nicholas C.
AU - Munro, James B.
AU - Nagaraj, Sushma
AU - Crabtree, Jonathan
AU - Pelle, Roger
AU - Tallon, Luke
AU - Nene, Vish
AU - Bishop, Richard
AU - Silva, Joana C.
N1 - Funding Information:
Funding for the work was provided by the Bill and Melinda Gates Foundation (US) (OPP1078791 to VN), the Agricultural Research Service (59–5348–4-001, with cooperative agreement to JCS), and the National Institute of Allergy and Infectious Diseases (R01AI141900 to JCS). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Funding Information:
All sequencing data were generated by Maryland Genomics, Institute for Genome Sciences, University of Maryland School of Medicine.
Publisher Copyright:
© 2023, The Author(s).
PY - 2023/12
Y1 - 2023/12
N2 - Multigene families often play an important role in host-parasite interactions. One of the largest multigene families in Theileria parva, the causative agent of East Coast fever, is the T. parva repeat (Tpr) gene family. The function of the putative Tpr proteins remains unknown. The initial publication of the T. parva reference genome identified 39 Tpr family open reading frames (ORFs) sharing a conserved C-terminal domain. Twenty-eight of these are clustered in a central region of chromosome 3, termed the “Tpr locus”, while others are dispersed throughout all four nuclear chromosomes. The Tpr locus contains three of the four assembly gaps remaining in the genome, suggesting the presence of additional, as yet uncharacterized, Tpr gene copies. Here, we describe the use of long-read sequencing to attempt to close the gaps in the reference assembly of T. parva (located among multigene families clusters), characterize the full complement of Tpr family ORFs in the T. parva reference genome, and evaluate their evolutionary relationship with Tpr homologs in other Theileria species. We identify three new Tpr family genes in the T. parva reference genome and show that sequence similarity among paralogs in the Tpr locus is significantly higher than between genes outside the Tpr locus. We also identify sequences homologous to the conserved C-terminal domain in five additional Theileria species. Using these sequences, we show that the evolution of this gene family involves conservation of a few orthologs across species, combined with gene gains/losses, and species-specific expansions.
AB - Multigene families often play an important role in host-parasite interactions. One of the largest multigene families in Theileria parva, the causative agent of East Coast fever, is the T. parva repeat (Tpr) gene family. The function of the putative Tpr proteins remains unknown. The initial publication of the T. parva reference genome identified 39 Tpr family open reading frames (ORFs) sharing a conserved C-terminal domain. Twenty-eight of these are clustered in a central region of chromosome 3, termed the “Tpr locus”, while others are dispersed throughout all four nuclear chromosomes. The Tpr locus contains three of the four assembly gaps remaining in the genome, suggesting the presence of additional, as yet uncharacterized, Tpr gene copies. Here, we describe the use of long-read sequencing to attempt to close the gaps in the reference assembly of T. parva (located among multigene families clusters), characterize the full complement of Tpr family ORFs in the T. parva reference genome, and evaluate their evolutionary relationship with Tpr homologs in other Theileria species. We identify three new Tpr family genes in the T. parva reference genome and show that sequence similarity among paralogs in the Tpr locus is significantly higher than between genes outside the Tpr locus. We also identify sequences homologous to the conserved C-terminal domain in five additional Theileria species. Using these sequences, we show that the evolution of this gene family involves conservation of a few orthologs across species, combined with gene gains/losses, and species-specific expansions.
KW - Assembly gaps
KW - Lineage-specific expansion
KW - Multigene family
KW - Theileria parva
UR - http://www.scopus.com/inward/record.url?scp=85177860516&partnerID=8YFLogxK
U2 - 10.1007/s00239-023-10142-z
DO - 10.1007/s00239-023-10142-z
M3 - Article
C2 - 38017120
AN - SCOPUS:85177860516
SN - 0022-2844
VL - 91
SP - 897
EP - 911
JO - Journal Of Molecular Evolution
JF - Journal Of Molecular Evolution
IS - 6
ER -