Department of Computer Science at UH

University of Houston

Department of Computer Science

In Partial Fulfillment of the Requirements for the Degree of
Master of Science

Krushita Shah

Will defend her thesis

New Data Structures and Algorithms to support
Next Generation Sequencing Technology

Abstract

DNA Sequencing technologies introduced in this decade have reduced the time per genome to days or weeks. Next Generation Sequencing technologies generate a huge amount of data in the form of short DNA sequences called ‘reads’ along with quality scores for each base in the read. Using these reads and qualities, we can perform Genome Assembly and Mapping and can distinguish between SNPs (Single Nucleotide Polymorphisms) and sequencing errors. As part of this project, new data structures and algorithms are developed to handle this huge amount of data and perform analysis on it. The key steps of the design are:
• Developing data structures for handling sequencing reads of arbitrary lengths.
• Developing data structures for quick access to reads along with the quality scores of each base in the read.
• Incorporating the Quality scores of the Reads into a Mapping algorithm to map sequenced data to reference genomes.

Date: Friday, February 27, 2009
Time: 11:00 AM
Place: 550-PGH
Faculty, students, and the general public are invited.
Advisor: Dr. Yuriy Fofanov