Skip to content
CSDE News & Events

Three principles of data science: predictability, computability, and stability (PCS), Bin Yu (CSSS Seminar, 5/8/2019)

Posted: 5/6/2019 (Local Events)

Bin Yu, Chancellor’s Professor, Departments of Statistics and EECS, University of California, Berkeley, statistics.berkeley.edu/~binyu

Abstract

In this talk, I’d like to discuss the intertwining importance and connections of three principles of data science in the title and the PCS workflow that is built on the three principles for a data science life cycle including problem formulation, data cleaning, EDA, modeling, post-hoc analysis and data conclusions. The principles will be demonstrated in the context of two collaborative projects in neuroscience and genomics for interpretable data results and testable hypothesis generation. If time allows, I will present proposed PCS inference that includes perturbation intervals and PCS hypothesis testing. The PCS inference uses prediction screening and takes into account both data and model perturbations. Finally, a PCS documentation is proposed based on Rmarkdown, iPython, or Jupyter Notebook, with publicly available, reproducible codes and narratives to back up human choices made throughout a data science life cycle. The PCS workflow and documentation are demonstrated in a genomics case study available on Zenodo.

Links to papers: [1] Three principles of data science: predictability, computability and stability (PCS) (https://arxiv.org/abs/1901.08152). [2] Interpretable machine learning: definitions, methods and applications (https://arxiv.org/abs/1901.04592)

Read Full Article

Date: 05/08/2019

Time: 12:30–1:30 PM

Location: Savery (SAV) 409