Machine Learning-Based Classification of DNA Nucleotides from 2D MoS2 Nanopore Ionic Current Signals
ORAL
Abstract
Nanopore-based DNA sequencing enables real-time analysis of single molecules by measuring how nucleotides disrupt ionic currents as they pass through a nanopore. These current shifts encode valuable information but also introduce significant noise and variability, making accurate base classification challenging. Deep learning models such as CNNs and LSTMs have achieved high accuracy in this task, but they require large datasets and substantial computational resources. To address this, we developed a more efficient approach using an optimized XGBoost classifier. After cleaning the data with statistical outlier removal, we engineered new features from translocation times and current signals. Using these features, our model achieved ~96% accuracy on a small dataset, outperforming traditional classifiers and rivaling deep learning methods. These results demonstrate that gradient-boosted decision trees provide a lightweight, interpretable, and scalable solution for nucleotide classification, particularly well-suited for real-time or resource-constrained environments. Future work will focus on capturing temporal dynamics and validating the model in live nanopore sequencing workflows.
–
Presenters
-
Benjamin O Tayo
University of Central Oklahoma
Authors
-
Benjamin O Tayo
University of Central Oklahoma
-
Basant Banjara
University of Central Oklahoma
-
Rameshwar Kumawat
Northwestern University