This course is meant to fill a perceived curriculum gap between methods courses that emphasize study design and statistics courses that teach statistical analysis. It focuses on applied methods for data preparation and will introduce the following topics: data management and documentation, data cleaning and variable creation, working with demographic data, and reproducibility. It consists of two five week modules: The first will use an Add Health public data set and Stata to teach documentation, data cleaning, and variable creation. The second will use The Human Mortality Database and R (via RStudio) to teach visual displays in R and reproducibility (via RMarkdown).
The course presumes familiarity with Stata as well as R and RStudio.
- Introductory graduate level research methods and statistics
- Basic knowledge of Stata; see below for details
- Basic knowledge of R; see below for details
- Ability to work in the RStudio environment
Students taking this course are required to have prior experience with Stata: its graphical user interface, data types, varlists, help pages, and .do files. They need to know the basics of reading and writing Stata data files, renaming variables, labeling variables, labeling variable values, assigning missing values and using the Stata commands list and tabulate to check their work. Note that these basics are covered in the CSDE Introduction to Stata Workshop. If you are not too familiar with Stata, then the recommendation is to go over the material in the workshop and do the exercises. When you are confident with the material in the workshop, you are ready to take this course.
There are many resources for learning Stata. A few are listed below, but many others can be found by searching the web.
Students taking this course are required to have prior experience with R. This includes creating objects (variables) in R and saving them to an R workspace; reading an R workspace (.RData) file, reading data files in ascii text format, and using functions. They need to be able to work with vectors, arrays (matrices) and data frames. If you are not too familiar with R, the recommendation is that you go over the material and exercises in the CSDE Introduction to R Workshop. When you are confident with the material in the workshop, then you are ready to take this course. Experience with the RStudio environment is also required, including knowing how to create and use an R script file. If you understand all of the RStudio windows as shown in this Introduction to RStudio, you will be adequately prepared to take this course.
There are many resources for learning R. A few are listed below, but many others can be found by searching the web.