Department of Computer Science at UH

University of Houston

Department of Computer Science

In Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy

Mark M. Rojas

Will defend his PhD dissertation


Computational Approaches to Detect Pathogens in the Presence of Complex Backgrounds

Abstract

Fast and accurate identification of pathogenic microorganisms in complex clinical and environmental samples is essential for the prevention and treatment of infectious diseases. The most sensitive detection approaches are based on the examination of the nucleic acid composition of the sample to identify the presence of pathogens DNA/RNA. Large spectrum of nucleic acid-based assays (e.g. PCR, RT-PCR, and oligonucleotide microarrays) is designed to examine a sample for the presence of pre-defined genomic signatures: short pathogen-specific DNA/RNA fragments.

Identification of such signatures however, represents significant computational challenges. To be pathogen specific, each signature (or combination of signatures) must be present across all strains of the pathogen, and absent in all other organisms including its close neighbors, and must have assay specific biochemical and thermodynamic properties, such as binding energy, melting temperature, and nucleotide composition. All available signature design algorithms rely on heuristics and are known to miss cases when potential signatures are (explicitly or with small number of mismatches) also present in host and/or non-pathogen microorganisms causing false positive outcomes. An even greater challenge for the design of biochemical platform specific genomic signatures (probes/primers) is that each type of instrument uses different biochemical protocols to detect signatures which must also be considered during the signatures design process.

To address these challenges we have developed novel algorithms and data structures able to bring all possible subsequences located in given pathogen genome into the process of designing signatures. Moreover, the developed algorithms make it possible to consider mismatches (insertions, deletions, and substitutions for all positions) into the design process. We also have developed the concept of ultra-specific genomic islands: genomic regions in which every subsequence is several mismatches away from the closest subsequence which may appear in a host genome and/or non-pathogenic near-neighbors of targeted pathogen. This concept allows to improve the quality and flexibility (i.e. genomic islands can be used to identify thermodynamically acceptable signatures) of the design of biochemical platform specific detection tests. Developed approach was successfully used to design a variety of tests for Category A, B, and C, pathogens including the 2009 H1N1 Influenza outbreak originated in Mexico.

 

Date: Tuesday, November 20, 2012
Time: 2:30 PM
Place: 4018-SERC

Faculty, students, and the general public are invited.
Advisor: Dr. Yuriy Fofanov