APS Logo

An Introduction to Cloud-Based Data Science Tools

Invited

Abstract

Teaching methods for programming and data science-related topics have been evolving faster than ever before. This has been heavily influenced by the fast-growing popularity of cloud-based tools. In this talk, I will provide an overview of tools and techniques that can improve both the learning experience of the students and the instructor’s ability to manage the class and materials. I will discuss the best practices to manage and distribute code and data, as well as the platforms used in a data science project. Among a vast space of competitive solutions, I use Google products as the primary platform, but the concepts are transferable. Google Colaboratory (Colab) will be introduced as a solution to run and share the code. Beyond Colab, I will present an end-to-end data science project on a cloud-based ecosystem, using Google Cloud. In addition to the essential elements of Google Cloud, I will cover ways to tackle big data problems using Hadoop and Spark, as well as utilizing containerized applications for large scale parallel processing. I will illustrate how I have used cloud computing in my classes at Boston University and share feedback from the students.

Presenters

  • Mohammad Soltanieh-Ha

    Boston University

Authors

  • Mohammad Soltanieh-Ha

    Boston University