Feature Engineering for Data Driven Applications in Physical Sciences

ORAL

Abstract

In data driven applications, features are attributes of the system used to represent the underlying problem to the algorithm. Feature engineering involves the transformation of raw data into descriptive and discriminative elements. In addition to improved performance from predictive models, this can be used to improve the interpretability of the model. Owing to its importance, approaches have been developed for feature construction, selection and transformation. However, the nature of data produced in physical science problems makes some of these approaches sub-optimal, while others may be rendered misleading.

In this talk, we focus on the pertinence of different approaches to feature engineering in physical science applications. In particular we investigate the issue of multi-collinearity amongst groups of features of arbitrary sizes using illustrative datasets. Finally, we compare the performance of different approaches on a corpus of data from fluid flow simulations.

Presenters

  • Aashwin Mishra

    Stanford University, Stanford Univ

Authors

  • Aashwin Mishra

    Stanford University, Stanford Univ

  • Gianluca Iaccarino

    Stanford University, Stanford Univ