RNA Structure Annotation Based on Base Pairs Using ML Based Classifiers
David Johnson
Prasad Kulkarni
RNA molecules play a crucial role in the regulation of gene expression and other cellular processes. Understanding the three-dimensional structure of RNA is essential for predicting its function and interactions with other molecules. One key feature of RNA structure is the presence of base pairs, where nucleotides i.e., adenine(A), guanine(G), cytosine(C), and uracil(U), form hydrogen bonds with each other. The limited availability of high-quality RNA structural data combined with associated atomic coordinate errors in low resolution structures, presents significant challenges for extracting important geometrical characteristics from RNA's complex three-dimensional structure, particularly in terms of base interactions.
In this study, we propose an approach for annotating base-pairing interactions in low-resolution RNA structures using machine learning (ML) based classifiers and leveraging the more precise structural information available in high-resolution homologs to annotate base-pairing interactions in low-resolution structures. We first use DSSR tool to extract annotations of high-resolution RNA structures and extract distances of atoms of interacting base pairs. The distances serve as features, and 12 standard annotations are used as labels for our ML model. We then apply different ML classifiers, including support vector machines, neural networks, and random forests, to predict RNA annotations. We evaluate the performance of these classifiers using a benchmark dataset and report their precision, recall, and F1-score. Low-resolution RNA structures are then annotated based on the sequence-similarity with high-resolution structures and the corresponding predicted annotations.
For future aspects, the presented approach can also help to explore the plausible base pair interactions to identify conserved motifs in low-resolution structures. The detected interactions along with annotations can aid in the study of RNA tertiary structures, which can lead to a better understanding of their functions in the cell.