Books+ Search Results

Comparison of SNP Calling Tools for RNA Sequencing Data

Author
Title
Comparison of SNP Calling Tools for RNA Sequencing Data [electronic resource].
ISBN
9780355922745
Published
Ann Arbor : ProQuest Dissertations & Theses, 2018.
Physical Description
1 online resource (29 p.)
Local Notes
Access is available to the Yale community.
Notes
Source: Masters Abstracts International, Volume: 57-05.
Includes supplementary digital materials.
Adviser: Xiting Yan.
Access and use
Access restricted by licensing agreement.
Summary
The identification of single nucleotide polymorphisms commonly known as SNP calling is one of the most prominent applications of high-throughput sequencing data. Although the tools for implementing SNP calling on DNA sequencing data have been highly developed, there are relatively fewer tools focusing on SNP calling on RNA sequencing data. Developed in the Data Sciences Platform at the Broad Institute, the Genome Analysis Toolkit (GATK) is one of the most commonly used toolkits for variant discovery in high-throughput sequencing data. However, as it was primarily designed to process exomes and whole genomes, GATK handles the RNA sequencing data largely the same way as the DNA sequencing data without taking into account some of the unique features in RNAseq, such as extreme allelic imbalance. Another tool specifically designed for ascertaining cSNP genotypes from RNA sequence data, HMM-ASE, employed hidden Markov models to handle the presence of allelic imbalance by exploiting linkage disequilibrium, which the first one to apply HMM to identify genotypes from RNA sequencing data.
In this paper, we implemented SNP calling in 940 sequenced Bronchoalveolar Lavage (BAL) cell samples collected from a large cohort of sarcoidosis patients enrolled in the Genomic Research in Alpha-1 Antitrypsin Deficiency and Sarcoidosis (GRADS) study using both GATK and HMM-ASE methods, and we assessed the accuracy of the results using 122 paired technical replicates from 60 individuals by both visualization and measurement of the reproducibility rate. We also tested the hypothesis that SNPs from lowly expressed genes are less robust between replicates by filtering the SNPs using sequence depth, which was partially validated by the observation of GATK results. After that, comparisons were made between the results from two different methods, and the final conclusion was that the HMM-ASE method performed better in RNA sequencing data than GATK. Next-step plans were presented at the end.
Format
Books / Online / Dissertations & Theses
Language
English
Added to Catalog
July 30, 2018
Thesis note
Thesis (M.P.H.)--Yale University, 2018.
Also listed under
Yale University. Public Health.
Citation

Available from:

Online
Loading holdings.
Unable to load. Retry?
Loading holdings...
Unable to load. Retry?