Department of Computer Science at UH

University of Houston

Department of Computer Science

In Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy

Meenakshi Sharma

Will defend her dissertation


Novel Algorithms to estimate Genome Coverage using High Throughput Sequencing data

Abstract

Genetic variation can occur in the form of single base changes called Single Nucleotide Polymorphisms (SNPs) or large-scale structural alterations called Copy Number Variations (CNVs). Identification and analysis of CNV(s) is critical in understanding its association with evolution, health, and disease. Over the past decade, new advancements in DNA sequencing technologies have fueled the field of genomics and opened new doors for performing Copy Number Analysis (CNA).

To perform CNA, millions of short subsequences or reads produced by High Throughput Sequencing (HTS) platforms are aligned to reference genome sequence(s). The sequence alignment process outputs total number of reads covering each location in the genome which is collectively called as reads coverage.

The focus of this research is to develop novel algorithms to accurately estimate coverage in the presence of DNA repeats and single nucleotide mutations. The copy number distribution of the reads mapped to the reference sequence would ideally follow a Poisson distribution assuming that the nucleotide sequence of a genome is random and the sequencing reads came from the random locations in the genome. The coverage data, however, exhibits over-dispersion in the extreme ends of the distribution. Repeatable sequences and SNPs contribute to these unexpected high coverage frequencies.

This dissertation presents novel algorithms to estimate the average coverage using a model based on Poison distribution. The model was tested on both simulated and real data with different coverage depths and predicts actual model parameters with reasonably good accuracy. The proposed approach improves estimation of average genome coverage which is central to gene-expression, DNA methylation and metagenomic studies.

 

Date: Wednesday, April 23, 2014
Time: 10:00 AM
Place: HSBC 302

Faculty, students, and the general public are invited.
Advisor: Prof. Ioannis Pavlidis