Introduction to the use of R with Relational Databases
Instructor: Phil Hurvitz
Objective
This course will provide a broad overview of using R to interface with relational databases. By the end of the course you will be able to:
- Make connections with PostgreSQL/PostGIS and SQLite/GPKG databases
- Pull tables from the database into R as data frames
- Perform basic SQL operations on tables and return results as R data frames
- Perform spatial queries within the database and return results as R data frames
Rationale
R has become an ubiquitous environment for data management and analysis, powered by its extensible architecture supported by numerous packages designed for a myriad of uses. Despite its utility, base R suffers from a fundamental limitation of requiring data sets to be loaded into RAM. For very large data sets, this can be problematic. One work-around is to store data in relational databases (e.g., PostgreSQL and SQLite), which are efficient at storing and analyzing large tables. Where possible, much of the data wrangling needed for generating analytic tables can be performed within the database, with results converted to data frames for statistical analysis or visualization within R. In this workshop, we will cover basic functionality of the R DBI package, which controls the basic communication between R and SQL databases. We will use PostgreSQL/PostGIS and SQLite as the storage databases. Attention will be paid to (1) using SQL queries to distill large raw database tables into manageable R data frames and (2) creating reports with resultant tables and graphics within the RMarkdown documentation framework.
Missed the workshop? The recording is available here
Prerequisites |
The following prerequisites will be required for you to be able to follow along with the class.
Before class
|
Materials |
Online materials Recorded workshop |