Machine Learning-Based Classification of DNA Nucleotides from 2D MoS2 Nanopore Ionic Current Signals

ORAL

Abstract

Nanopore-based DNA sequencing enables real-time analysis of single molecules by measuring how nucleotides disrupt ionic currents as they pass through a nanopore. These current shifts encode valuable information but also introduce significant noise and variability, making accurate base classification challenging. Deep learning models such as CNNs and LSTMs have achieved high accuracy in this task, but they require large datasets and substantial computational resources. To address this, we developed a more efficient approach using an optimized XGBoost classifier. After cleaning the data with statistical outlier removal, we engineered new features from translocation times and current signals. Using these features, our model achieved ~96% accuracy on a small dataset, outperforming traditional classifiers and rivaling deep learning methods. These results demonstrate that gradient-boosted decision trees provide a lightweight, interpretable, and scalable solution for nucleotide classification, particularly well-suited for real-time or resource-constrained environments. Future work will focus on capturing temporal dynamics and validating the model in live nanopore sequencing workflows.

Presenters

  • Benjamin O Tayo

    University of Central Oklahoma

Authors

  • Benjamin O Tayo

    University of Central Oklahoma

  • Basant Banjara

    University of Central Oklahoma

  • Rameshwar Kumawat

    Northwestern University