From Fragments to Function: Computational Approaches for Reconstructing Biological Context in Metagenomic and Exosomal Discovery


Student Name: Sirisha Thippabhotla
Defense Date:
Location: Eaton Hall, Room 2001B
Chair: Cuncong Zhong

Prasad Kulkarni

Fengjun Li

Zijun Yao

Liang Xu

Abstract:

Advances in high-throughput Next Generation Sequencing (NGS) technologies have transformed our ability to study biological systems. However, a fundamental gap remains between generating data and interpreting it. Sequencing produces genomes, transcriptomes, and cell-derived signals as millions of short, fragmented sequences, resulting in the loss of biological context, specifically the long-range relationships that determine genes, structured RNAs, or regulatory signals. This work investigates computational and experimental approaches to improve functional discovery by reconstructing or preserving biological context. The concept is developed across three interconnected dimensions: sensitivity, scalability, and biological fidelity, demonstrating that context is lost and must be recovered at two distinct stages of the discovery process.

The first contribution handles the loss of context that occurs after sequencing. By representing metagenomic sequencing reads as connected paths in an assembly graph and guiding graph traversal with biological models, this work recovers both protein-coding and non-coding signals that conventional fragment-level analyses fail to detect, thereby revealing functional pathways that would otherwise be missed. The second contribution makes this recovery practical at scale by introducing a significantly faster framework that preserves the sensitivity of graph-based methods while reducing computational costs by over an order of magnitude, thus enabling the analysis of large present-day datasets.

The third contribution studies the loss of context prior to sequencing. Using extracellular vesicles as a model system, the findings show that cells cultured in conventional two-dimensional environments generate signals that differ from their physiological state. In contrast, cells cultured in three-dimensional models produce signals that closely resemble those observed in patients. This shows that an accurate biological model is essential for reliable discovery, since computational methods cannot recover signals that are fundamentally distorted at their origin.

Taken together, these contributions establish a set of methods and principles for extracting meaningful biological information from fragmented, high-throughput genomic data, thereby enabling more accurate functional discovery.

Degree: PhD Dissertation Defense (CS)
Degree Type: PhD Dissertation Defense
Degree Field: Computer Science