TopoStats, a tool to discover the hidden structures and states of biomolecule
ORAL · Invited
Abstract
Atomic Force Microscopy (AFM) can image single molecules in liquid with sub-molecular resolution, without the need for labelling or averaging. Our high-resolution AFM methods can ‘see’ the sub-molecular details of a single biomolecule in liquid, without labelling or averaging, as it ‘explores’ its complex conformational space. This ability is especially important when looking at DNA, a molecule made more complex by its innate flexibility, compaction in the nucleus, and processing by essential cellular enzymes[1].
However, the lack of automated analysis tools in AFM, and slow integration of machine learning (ML) pipelines limits the analysis of its powerful molecular data. Limiting factors for the design and integration of these tools are AFM specific issues with raw data, and small datasets (compared to e.g. Cryo EM). Raw AFM images must undergo many “cleaning” steps before molecule identification can occur.
We have developed TopoStats (www.github.com/AFM-SPM/TopoStats), an open-source Python utility that handles data cleaning/processing and identifies/characterises individual (bio)molecules, from DNA origami to nuclear pore complexes. This enables us to begin using AFM to generate big data on the structure and conformational state of individual (bio)molecules. We have recently refactored TopoStats to make it easier to use, and to support the majority of AFM file formats.
TopoStats however still currently relies on a pipeline formed of thresholding methods to identify molecules of interest (MoIs). These methods are challenging to generalise and struggle to identify overlapping or more complex structures. We have begun to implement machine learning methodologies, including weakly-supervised random forests, and a recursive DBSCAN algorithm. We demonstrate that we can use these to identify multiple MoIs in one pass with higher accuracy and less user oversight than the gold-standard[2,3] software.
[1] Pyne, A. L. B. et al. Nature Communications 12, 1053 (2021).
[2] Necas, D. & Klapetek, P. Central European Journal of Physics 10, 181–188 (2011)
[3] Beton, J. G. et al. Methods 193, 68–79 (2021).
However, the lack of automated analysis tools in AFM, and slow integration of machine learning (ML) pipelines limits the analysis of its powerful molecular data. Limiting factors for the design and integration of these tools are AFM specific issues with raw data, and small datasets (compared to e.g. Cryo EM). Raw AFM images must undergo many “cleaning” steps before molecule identification can occur.
We have developed TopoStats (www.github.com/AFM-SPM/TopoStats), an open-source Python utility that handles data cleaning/processing and identifies/characterises individual (bio)molecules, from DNA origami to nuclear pore complexes. This enables us to begin using AFM to generate big data on the structure and conformational state of individual (bio)molecules. We have recently refactored TopoStats to make it easier to use, and to support the majority of AFM file formats.
TopoStats however still currently relies on a pipeline formed of thresholding methods to identify molecules of interest (MoIs). These methods are challenging to generalise and struggle to identify overlapping or more complex structures. We have begun to implement machine learning methodologies, including weakly-supervised random forests, and a recursive DBSCAN algorithm. We demonstrate that we can use these to identify multiple MoIs in one pass with higher accuracy and less user oversight than the gold-standard[2,3] software.
[1] Pyne, A. L. B. et al. Nature Communications 12, 1053 (2021).
[2] Necas, D. & Klapetek, P. Central European Journal of Physics 10, 181–188 (2011)
[3] Beton, J. G. et al. Methods 193, 68–79 (2021).
–
Publication: Pyne, A. L. B. et al. Base-pair resolution analysis of the effect of supercoiling on DNA flexibility and major groove recognition by triplex-forming oligonucleotides. Nature Communications 12, 1053 (2021)<br>Beton, J. G. et al. TopoStats – A program for automated tracing of biomolecules from AFM images. Methods 193, 68–79 (2021)<br>dos Santos, Á. et al. Autophagy receptor NDP52 alters DNA conformation to modulate RNA Polymerase II transcription. bioRxiv 2022.02.01.478690 (2022)
Presenters
-
Alice Pyne
University of Sheffield, University of Sheffield, UK
Authors
-
Alice Pyne
University of Sheffield, University of Sheffield, UK