Study highlights the potential of minority variation and alternative sequencing methods to enhance epidemiological understanding of TB transmission
In a recent study published in The Lancet Microbe, researchers investigated within-host variation of Mycobacterium tuberculosis complex (MTBC) and whether it augments transmission inferences.
Background
Reducing the global tuberculosis burden requires a decrease in incident MTBC infections. Nevertheless, the long, variable latency period of the infection renders transmission source identification and interventions challenging.
Further, transmission inference approaches often fail to detect most transmission linkages in high-incidence settings. Thus, novel, accessible strategies are needed to delineate high-resolution transmission patterns.
MTBC transmission inference approaches rely on single consensus genomes from infected individuals. However, MTBC exhibits a relatively slow rate of evolution, which means limited diversity in outbreaks. Moreover, identical MTBC genomes have been isolated from multiple individuals, making it challenging to reconstruct transmission chains. While MTBC diversity has been reported within infected persons, whether this within-host variation improves transmission inference is unclear.
About the study
In the present study, researchers quantified within-host MTBC variation and investigated whether it enhances transmission inference. They reanalyzed sequence data from prior MTBC household transmission studies, leveraging household members as proxies for transmission linkages. The PubMed database was searched for relevant studies with publicly available raw sequencing and epidemiological data. Raw sequence data were processed using a variant identification pipeline.
The researchers quantified minority variants within proline-proline-glutamic acid (PPE) and proline-glutamic acid (PE) genes. Minority variants were defined as positions with ≥ two alleles and minor allele frequency > 1%. PPE and PE genes were excluded from further analyses as they may be more error-prone. The mean minority variation per sample was compared between studies using the Wilcoxon rank sum test.
Pearson’s correlation coefficient was used to measure associations between the total number of minority variants, minor allele frequency, and median coverage depth. Further, the number of minority variants shared between household members and shared minority variants between epidemiologically unrelated pairs was estimated. Logistic regression models were fitted, including shared minority variants only, genetic cluster members only, or both.
Genetic cluster membership was defined by 5-single nucleotide polymorphism (SNP) and 12-SNP genetic distance thresholds. Receiver-operating characteristic (ROC) curves were used to estimate the performance of models in classifying household versus unrelated pairs. The correlations between the genetic distance between shared minority variants and MTBC consensus sequences were tested using Pearson’s correlation coefficient.
Findings
Overall, three household transmission studies were identified; they were conducted in Brazil, Canada, and England, respectively. Compared to randomly selected sequence pairs from the population, there was limited variation between MTBC sequences in isolates from the same household or among isolates from epidemiologically linked individuals.
Further, consensus MTBC sequences from individuals with epidemiological linkages were phylogenetic nearest neighbors. Genetic distances between consensus sequences were often larger than the common 12-SNP and 5-SNP thresholds, with 15.6% and 44.4% of household pairs not meeting these thresholds, respectively. PPE and PE genes exhibited a disproportionately high number of minority variants.
There were significant differences in minority variation outside PPE and PE genes across studies. Minority variants were unique in location, and none were identified in more than five samples. Only 1.3% of minority variants were stop mutations, and 50% were missense variants. Notably, the five most common variants occurred within intergenic regions across studies.
The median coverage depth was significantly correlated with the number of minority variants outside PPE and PE genes. A negative correlation was observed between minor allele frequency and site coverage depth. Per-sample minority variation was associated with MTBC lineage 2 isolates and negatively associated with lineage 3 and 4 isolates.
In addition, isolates from household pairs had more minority variants outside of PPE and PE genes than those from randomly selected pairs. The distribution of shared minority variants was significantly different between epidemiologically linked and unlinked pairs. Shared within-host variation outside of PPE and PE genes was significantly associated with household membership. Genomic clustering was similarly associated with household membership.
ROC curves indicated that the inclusion of shared within-host variation improved prediction accuracy. Shared minority variants declined with an increase in genetic distance between samples among epidemiologically unlinked pairs. However, for household pairs, no significant correlation was observed between the genetic distance between consensus sequences and the number of shared minority variants.
Conclusions
Taken together, the study revealed that within-host MTBC variation persists in sequence data from culture, with its magnitude varying within and between studies. In addition, MTBC isolates from epidemiologically linked people exhibit higher levels of variation than unlinked individuals.
These findings suggest that minority variation could provide valuable epidemiological information for transmission inferences. Further work is needed to optimize approaches for acquiring and incorporating within-host variation into automated pipelines for transmission and phylogenetic inference.