Computational Microbiome Analysis: Method Development, Integration and Clinical Applications
Esam El-Araby
Bo Luo
Zijun Yao
Mizuki Azuma
Metagenomics is the study of microbial genomes from one common environment. Metagenomic data is directly derived from all microorganisms present in the environmental samples, in- including those inaccessible through conventional methods like laboratory cultures. Thus it offers an unbiased view of microbial communities, enabling researchers to explore not only the taxonomic composition (identifying which microorganisms are present) but also the community’s metabolic functions.
The metagenomic data consists of a huge number of fragmented DNA sequences from diverse microorganisms with different abundance. These characteristics pose challenges to analysis and impede practical applications. Firstly, the development of an efficient detection tool for a specific target from metagenomic data is confronted by the challenge of daunting data size. Secondly, the accuracy of the detection tool is also challenged by the incompleteness of metagenomic data. Thirdly, numerous analysis tools are designed for individual detection targets, and many detection targets are contained within the data, there is a need for comprehensive and scalable integration of existing resources.
In this dissertation, we conducted the computational microbiome analysis at different levels: (1) We first developed an assembly graph-based ncRNA searching tool, named DRAGoM, to im- improve the detection quality in metagenomic data. (2) We then developed an automatic detection model, named SNAIL, to automatically detect names of bioinformatic resources from biomedical literature for comprehensive and scalable organizing resources. We also developed a method to automatically annotate sentences for training SNAIL, which not only benefits the performance of SNAIL but also allows it to be trained on both manual and machine-annotated data, thus minimizing the need for extensive manual data labeling efforts. (3) We applied different analyzing tools to metagenomic datasets from a series of clinical studies and developed models to predict therapeutic benefits from immunotherapy in non-small-cell lung cancer patients using human gut microbiome signatures.