Separation and Assembly of RNA virus High Throughput Sequencing Data into Discrete Full Length Sub-Population Genomes Open Access
Downloadable ContentDownload PDF
Sequence heterogeneity is a common characteristic of RNA viruses that is often referred to as sub-populations or quasispecies. The diversity and the dynamic nature of viral quasispecies have a significant impact on the phenotypic traits of human immunodeficiency virus (HIV) and other RNA viruses. In fact, these viruses rely on the quasispecies model, to adapt to environmental changes and escape selective forces such as antiviral drug treatment. Thus, to better understand the clinical implications and the biological behavior of HIV, one needs to analyze the genomic sequences of viral quasispecies in a sample.Traditional techniques used for assembly of short sequence reads produced by deep sequencing, such as de novo assemblers, ignore the underlying diversity. In this study, a novel algorithm called HexaHedron was developed, which simultaneously assembles discrete sequences of multiple genomes present in populations. The results are presented in the form of a novel graph, termed as nephosome, similar to a genome graph. High sensitivity of the algorithm enables genomic analysis of heterogeneous RNA virus genomes from patient samples and accurate detection of intra-host diversity, enabling both basic research in personalized medicine and accurate diagnostics.HexaHedron is a deterministic solution to the quasispecies spectrum reconstruction (QSR) problem, which is the task of reconstructing the quasispecies spectrum given a collection of high throughput sequencing (HTS) reads generated from a viral sample. Tools designed to address the QSR are already available and were compared to HexaHedron. The results of this benchmark reveal the superior performance of the proposed algorithm. HexaHedron was shown to process hundreds of millions of reads and accurately detect minor sub-populations of abundance as low as 0.1%.Finally, a pipeline that examines HIV samples for quasispecies resolved antiretroviral-resistant mutations (ARMs) was designed as part of this study. All publicly available HIV samples were processed and the resolved quasispecies sequences were individually scanned for ARMs to assess the effect of quasispecies diversity on antiretroviral resistance. Frequently co-occurring mutations such as the pairs M41L/T215Y and M184V/L74V were identified. Additionally, the study demonstrated the correlation between the increasing cardinality of HIV ARM sets and increasing antiretroviral resistance.