Point-Set Distributions of Gene-Sequences
ORAL
Abstract
There are many biologically relevant questions that one hopes can be addressed using studies of genetic sequences: (1) how pre-RNA is spliced into messenger-RNAs (which are used to generate proteins within cells)? (2) are there signatures of diseases in messenger-RNA sequences that can be extracted through sequence analyses? (3) what are differences between messenger RNAs in different organisms? Even though there are large collections of publicly accessible nucleic-acid sequences, it is extremely challenging to address biologically relevant questions using them. Part of the reason is that there are few quantitative tools for sequence analysis. We introduce a representation that maps each element of a sequence to a unique point in a unit plane. The resulting point-sets are amenable to tools developed in statistical mechanics and dynamical systems theory. It unravels many differences between nucleic-acid sequences in different groups. They can also be used, through a machine learning algorithm, to differentiate individual sequences.
–
Publication: E. Speakman and G. H. Gunaratne, ``On a Kneading Theory of Gene Splicing," CHAOS , 34, [Featured Article] (2024).
Presenters
-
Gemunu Gunaratne
University of Houston
Authors
-
Gemunu Gunaratne
University of Houston
-
Ethan Speakman
Univerisy of Houston