BD2K Guide to the Fundamentals of Data Science
Posted: 9/12/2016 (Local Events)
The NIH Big Data to Knowledge program is pleased to announce the BD2K Guide to the Fundamentals of Data Science, a series of online lectures given by experts from across the country covering a range of diverse topics in data science. This course is an introductory overview that assumes no prior knowledge or understanding of data science.
The series starts Friday, September 9th and will run all year once per week at 9:00-10:00 AM PST. No registration is required.
This is a joint effort of the BD2K Training Coordinating Center (TCC), the BD2K Centers Coordination Center (BD2KCCC), and the NIH Office of the Associate Director of Data Science. For more information about the series and to see archived presentations, visit the main site.
9/9/16: Introduction to big data and the data lifecycle (Mark Musen, Stanford).
9/16/16: SECTION 1: DATA MANAGEMENT OVERVIEW (Bill Hersh, Oregon Health Sciences).
9/23/16: Finding and accessing datasets, Indexing and Identifiers (Lucila Ohno-Machado, UCSD).
9/30/16: Data curation and Version control (Pascale Gaudet, Swiss Institute of Bioinformatics).
10/7/16: Ontologies (Michel Dumontier, Stanford).
10/14/16: Provenance(Zachary Ives, Penn).
10/21/16: Metadata standards (Susanna-Assunta Sansone, Oxford).
10/28/16: SECTION 2: DATA REPRESENTATION OVERVIEW (Anita Bandrowski, UCSD).
11/4/16: Databases and data warehouses, Data: structures, types, integrations (Chaitan Baru, NSF).
11/11/16: No lecture — Veteran’s Day.
11/18/16: Social networking data (TBD).
12/2/16: Data wrangling, normalization, preprocessing (Joseph Picone, Temple).
12/9/16: Exploratory Data Analysis (Brian Caffo, Johns Hopkins).
12/16/16 Natural Language Processing (Noemie Elhadad, Columbia).
The following topics will be covered in January through May of 2017:
SECTION 3: COMPUTING OVERVIEW
Programming and software engineering; API; optimization
Cloud, Parallel, Distributed Computing, and HPC
Commons: lessons learned, current state
SECTION 4: DATA MODELING AND INFERENCE OVERVIEW
Smoothing, Unsupervised Learning/Clustering/Density Estimation
Supervised Learning/prediction/ML, dimensionality reduction
Algorithms, incl. Optimization
Multiple testing, False Discovery rate
Data issues: Bias, Confounding, and Missing data
Data Visualization tools and communication
SECTION 5: ADDITIONAL TOPICS
Data sharing (including social obstacles)
Extra considerations/limitations for clinical data
SUMMARY and NIH context
Time: 9:00-10:00 AM PST