Essential data science instruction for biophysicists
Invited
Abstract
Biological physicists, along with researchers across nearly all biological sciences, obtain quantitative data in their experiments. Much of our training of students focuses on theoretical modeling of the phenomena underlying the measurements, often with state-of-the-art physics. As a necessary complement to this training, we should teach students how to manage their data sets and perform statistical inference using state-of-the-art techniques from the rapidly developing fields of data science and applied statistics. In this talk, I will discuss what I view as essential data science principles we should teach students. These are:
Software development: Modern principles including version control and test-driven development.
Validation: Ensuring that a data set from the data source meets expectations (format, missing values, etc.).
Wrangling: The process of converting data from the source to a more readily usable format.
Preservation and sharing: Strategies for long-term storage and easy sharing of data.
Visualization: Construction of instructive plots and interactive data displays.
Bespoke statistical modeling: Statistical inference custom built for the specific experiment, as opposed to off-the-shelf techniques.
I will discuss my approach to teaching these topics using hands-on, team-based analyses of real biological data sets.
Software development: Modern principles including version control and test-driven development.
Validation: Ensuring that a data set from the data source meets expectations (format, missing values, etc.).
Wrangling: The process of converting data from the source to a more readily usable format.
Preservation and sharing: Strategies for long-term storage and easy sharing of data.
Visualization: Construction of instructive plots and interactive data displays.
Bespoke statistical modeling: Statistical inference custom built for the specific experiment, as opposed to off-the-shelf techniques.
I will discuss my approach to teaching these topics using hands-on, team-based analyses of real biological data sets.
–
Presenters
-
Justin Bois
Division of Biology and Biological Engineering, Caltech
Authors
-
Justin Bois
Division of Biology and Biological Engineering, Caltech