cAPTured: Neural Reflex Arc-Inspired Fuzzy Continual Learning for Capturing in Silico Aptamer-Target Protein Interactions

Aviral Chharia1     Runjhun Saran2     Apurva Narayan3, 4, 5    
1CMU     2Waterloo Institute of Nanotechnology     3UWO     4UBC     5UWaterloo

International Joint Conference on Neural Networks (IJCNN), 2023

Abstract

Aptamers are oligonucleotides or peptides with unique binding properties for specific target molecules, and they have shown great potential in diagnostics, therapeutics, and bio-sensing. However, the current in vitro SELEX-based method for discovering new target-selective aptamers is challenging, time-consuming, and often unsuccessful in finding high-affinity aptamers. Recently, in silico methods have gained immense attention. However, since labeled interaction-pair data collection is expensive and needs highly trained specialists, available data is sparse. Further, since acquiring positive-class samples is even more challenging, available datasets showcase high-class imbalance. This makes designing deep learning models incredibly challenging, as they require a sufficiently large training set and are biased towards the dominant class. Additionally, current models cannot be updated in real-time, and end-to-end re-training is necessary for each new aptamer-target interaction pair discovery. The present work is the first to address both these challenges. We present cAPTured, a novel fuzzy continual learning method for predicting aptamer-target protein interaction pairs in a continual learning environment. cAPTured continually updates its learned feature space on a non-stationary interaction-pair data stream. We performed extensive evaluation studies and experiments to establish the effectiveness of the proposed approach. cAPTured outperforms existing methods on the benchmark dataset by a significant margin.


cAPTured model architecture illustrating GRK2 protein (in blue) interacting with C13 aptamer (in black).

Summary

Effect of hyperbox expansion coefficient (θ) and fuzziness control parameter (γ) on (a) number of hyperboxes, (b) sample testing time and (c) model training time.

Aptamers – short sequences of RNA, DNA, or peptides, have emerged as powerful tools with unique binding properties for specific target molecules. They hold immense potential especially in diagnostics, therapeutics, and biosensing. However, traditional lab-based methods for discovering aptamers are labor-intensive, costly and often fail to yield high-affinity aptamers. The current benchmark is SELEX which takes up to 02 years for finding a single high affinity Aptamer. The goal of this project was not only to develop a faster alternative of SELEX to accelerate the discovery of Aptamers and cut down the required time but also outperform it in terms of precision.

We developed “cAPTured”, a fuzzy continual machine learning model for predicting aptamer-target protein interactions. cAPTured used four distinct feature encodings: k-mer and revck-mer based aptamer-sequence encoding, as well as AAC and PseAAC based target protein-sequence encoding. These encoding methods were used to extract essential information from aptamer and protein sequences. Subsequently, cAPTured fused and embedded these encoded features into a low-dimensional latent space, preserving statistically significant features. Here, the goal was to design the model to be agnostic to distribution shifts, allowing it to adapt and update its learned feature space in real-time based on a non-stationary interaction-pair data stream.

Our results found that cAPTured cuts the time required for aptamer discovery from the current benchmark of around 2-3 years (which is required by SELEX) to just a few minutes. cAPTured outperforms existing benchmarks by exhibiting a 04% increase in precision when tested on benchmark lab datasets. Notably due to its inherent feature engineering and fuzzy neural network-based design, cAPTured excels in handling limited data scenarios (with just 400, 300 or even 200 training samples) and maintaining relevance over time (05 pairs of distribution shifts), preventing the model from becoming outdated as new aptamer-target interaction pairs are discovered. This was additionally verified by mapping a t-SNE plot. These results underscore the potential of cAPTured as a valuable tool in the field of bioinformatics.

In conclusion, using cAPTured, bioinformatics engineers can massively reduce the experiment duration to find a high affinity aptamer by 700x times from the current benchmark in addition to increasing the precision by 04%. cAPTured’s ability to adapt to distribution shifts and its robust performance in limited data scenarios position it as a versatile and enduring tool.

t-SNE plots: (a) Before (b) After. From the plot, it can be inferred that our method aligns both classes by minimizing the gap.

BibTeX

@inproceedings{chharia2023captured,
  title={cAPTured: Neural Reflex Arc-Inspired Fuzzy Continual Learning for Capturing in Silico Aptamer-Target Protein Interactions},
  author={Chharia, Aviral and Saran, Runjhun and Narayan, Apurva},
  booktitle={2023 International Joint Conference on Neural Networks (IJCNN)},
  pages={1--9},
  year={2023},
  organization={IEEE}
}