Center for Studies in Demography and Ecology

R and Relational Databases

Instructor: Phil Hurvitz

Date: 4/13/2020
Time: 12:30pm-3:30pm
Location Zoom: https://washington.zoom.us/j/203605351

Register for Workshop


Objective
This course will provide a broad overview of using R to interface with relational databases. By the end of the course you will be able to:

  1. Make connections with PostgreSQL/PostGIS and SQLite/GPKG databases
  2. Pull tables from the database into R as data frames
  3. Perform basic SQL operations on tables and return results as R data frames
  4. Perform spatial queries within the database and return results as R data frames

Rationale
R has become an ubiquitous environment for data management and analysis, powered by its extensible architecture supported by numerous packages designed for a myriad of uses. Despite its utility, base R suffers from a fundamental limitation of requiring data sets to be loaded into RAM. For very large data sets, this can be problematic. One work-around is to store data in relational databases (e.g., PostgreSQL and SQLite), which are efficient at storing and analyzing large tables. Where possible, much of the data wrangling needed for generating analytic tables can be performed within the database, with results converted to data frames for statistical analysis or visualization within R. In this workshop, we will cover basic functionality of the R DBI package, which controls the basic communication between R and SQL databases. We will use PostgreSQL/PostGIS and SQLite as the storage databases. Attention will be paid to (1) using SQL queries to distill large raw database tables into manageable R data frames and (2) creating reports with resultant tables and graphics within the RMarkdown documentation framework.

Prerequisites
The following prerequisites will be required for you to be able to follow along with the class.

  • Your own computer with software pre-installed (see below). You will need your own computer since we will not be meeting in person.
  • A decent internet connection. You will need this in order to follow the Zoom session.
  • Intermediate skills using vectors, variables, and data frames in R, for example:
    • Defining variables (creating, setting values)
    • Subsetting data frames
    • Creating and calculating new columns in data frames
    • Summarizing one column based on group identifiers in another column
  • Beginning to intermediate skills with desktop GIS (e.g., ArcGIS Desktop or QGIS) — with basic knowledge of overlay operations (e.g., intersect, identity)
    This is so you can understand some of the spatial queries we will be performing.
  • Some familiarity with structured query language (SQL) and database structure (tables, columns, views, etc.)

Before class
Make sure to have installed on your computer:

  1. R and RStudio
  2. PostgreSQL (see https://www.postgresql.org/) and PostGIS (https://www.postgis.net/)
    These may help: https://www.postgresqltutorial.com/install-postgresql/ (PC)
    https://thecodersblog.com/PostgreSQL-PostGIS-installation/ (Mac)
  3. QGIS (see https://qgis.org)
  4. SQLite
    https://www.sqlitetutorial.net/download-install-sqlite/ (PC)
    https://www.dev2qa.com/how-to-install-sqlite3-on-mac/ (Pac)

Online materials
A link to the online materials will be posted here.