Center for Studies in Demography and Ecology

BD2K Guide to the Fundamentals of Data Science

Posted: 9/12/2016 (Local Events)

The NIH Big Data to Knowledge program is pleased to announce the BD2K Guide to the Fundamentals of Data Science, a series of online lectures given by experts from across the country covering a range of diverse topics in data science. This course is an introductory overview that assumes no prior knowledge or understanding of data science.

The series starts Friday, September 9th and will run all year once per week at 9:00-10:00 AM PST. No registration is required.

***To join the meeting, view the login information here.
***First GoToMeeting? Try a test session.

This is a joint effort of the BD2K Training Coordinating Center (TCC), the BD2K Centers Coordination Center (BD2KCCC), and the NIH Office of the Associate Director of Data Science. For more information about the series and to see archived presentations, visit the main site.

Schedule
9/9/16: Introduction to big data and the data lifecycle (Mark Musen, Stanford).
9/16/16: SECTION 1: DATA MANAGEMENT OVERVIEW (Bill Hersh, Oregon Health Sciences).
9/23/16: Finding and accessing datasets, Indexing and Identifiers (Lucila Ohno-Machado, UCSD).
9/30/16: Data curation and Version control (Pascale Gaudet, Swiss Institute of Bioinformatics).
10/7/16: Ontologies (Michel Dumontier, Stanford).
10/14/16: Provenance(Zachary Ives, Penn).
10/21/16: Metadata standards (Susanna-Assunta Sansone, Oxford).

10/28/16: SECTION 2: DATA REPRESENTATION OVERVIEW (Anita Bandrowski, UCSD).
11/4/16: Databases and data warehouses, Data: structures, types, integrations (Chaitan Baru, NSF).
11/11/16: No lecture — Veteran’s Day.
11/18/16: Social networking data (TBD).
12/2/16: Data wrangling, normalization, preprocessing (Joseph Picone, Temple).
12/9/16: Exploratory Data Analysis (Brian Caffo, Johns Hopkins).
12/16/16 Natural Language Processing (Noemie Elhadad, Columbia).

The following topics will be covered in January through May of 2017:

SECTION 3: COMPUTING OVERVIEW
Workflows/pipelines
Programming and software engineering; API; optimization
Cloud, Parallel, Distributed Computing, and HPC
Commons: lessons learned, current state

SECTION 4: DATA MODELING AND INFERENCE OVERVIEW
Smoothing, Unsupervised Learning/Clustering/Density Estimation
Supervised Learning/prediction/ML, dimensionality reduction
Algorithms, incl. Optimization
Multiple testing, False Discovery rate
Data issues: Bias, Confounding, and Missing data
Causal inference
Data Visualization tools and communication
Modeling Synthesis

SECTION 5: ADDITIONAL TOPICS
Open science
Data sharing (including social obstacles)
Ethical Issues
Extra considerations/limitations for clinical data
Reproducible Research
SUMMARY and NIH context

Read Full Article

Date: 09/16/2016

Time: 9:00-10:00 AM PST