# key setups
library(sqldf) # to use sql syntax with data frames
library(knitr) # knitr for kable tables
library(kableExtra) # pretty tables
library(sf) # simple features (GIS)
library(leaflet) # nice maps
library(leaflet.minicharts)
library(tools) # md5sum
library(stringi)
library(tidyverse)
library(magrittr)
library(htmltools)
library(gdalUtilities)

Introduction

This section of the UW Elections Database presents a methodology and results for estimating the racial summary characteristics of each voting precinct in the State of Washington. The source of the racial composition data is the US Census American Community Survey 5-year rolling surveys. Because the precinct boundaries do not match the census tracto or block group boundaries, it is necessary to use geographic information system analysis to estimate the counts and proportions of different race groups per voting precinct.

Methods

Data

Data were obtained from the US Census Bureau and the office of the Washington Secretary of State.

Data were combined and ultimately stored in a PostgreSQL/PostGIS database housed at the University of Washington’s Center for Studies in Demography and Ecology.

Elections data

Precinct data

Precinct shapefiles (GIS data) were obtained from the WA Secretary of State web site and imported to the PostGIS database. Data were stored in partitioned tables stratified by election year. The identifier for precincts is the combination of county, precinct name, and election year.

Census data

Census Citizen Voting Age Population by Race and Ethnicity (CVAP) data were downloaded from the US Census web site. Data for years 2000-2008 were represented with year 2000 decennial census. Data from years 2009-present were represented as the 5-year ACS data terminating at the named year. Data were obtained at the smallest geographic unit available with racial information, which was either the tract (2009-2012) or the block group (all other years).

Counts of persons \(\ge\) 18 years of age residing in the census unit were obtained, representing the following racial categories:

  • total
  • white
  • black
  • aian = American Indian/Alaska Native
  • asian
  • nhpi = Native Hawai’ian/Pacific Islander
  • other

It should be noted that the CVAP data for 2000 did not have values for American Indian/Alaska Native or Native Hawai’ian/Pacific Islander racial groups.

Data stratified by Hispanic/Non-Hispanic ethnicity will be added to the dataset in early 2024.

Spatial and tabular data were stored in partitioned tables stratified by census year.

Analysis

Elections results files were merged into a single table and imported to the PostGIS database. Records are indexed by county, precinct name, election date, and election race identifier.

Precinct and census data were conflated by the following process:

  1. Precinct polygons and census unit polygons were intersected, providing slivers. Original census unit area was saved as an attribute.
  2. An area weighting factor was calculated as \(W = \frac{area_{sliver}}{area_{orig}}\)
  3. Each census racial count was multiplied by the area weighting factor to obtain an estimate of the count of each racial group within each sliver
  4. Racial group counts were aggregated back to the precinct to provide an estimate of the count of each different racial group in the precinct.

Results

Data

## [1] 1

Tabular data are available as a CSV file: precinct_census_cvap_agg_votes_nogeog.csv.zip, MD5sum = NA. The format is a hybrid of both “long” and “wide” formats. There is a record for each year \(\times\) county \(\times\) precinct \(\times\) election race (i.e., this defines the “long” format); for each year \(\times\) county \(\times\) precinct combination, there are columns for the racial group count estimates (i.e., the “wide” format). This means that the racial group counts are repeated across records that have the same values for year \(\times\) county \(\times\) precinct.

A GPKG file precinct_census_cvap.gpkg is provided that contains one record per year \(\times\) county \(\times\) precinct. Because the relationship between a given year \(\times\) county \(\times\) precinct record and election races is one-to-many, those columns that represent multiple measures are represented as comma-delimited strings. This is probably unhelpful but there is a trade-off between storing multiple copies of the same geometry versus a single geography and storing “long” records for each election race. A proper approach for analytics for a particular race would be to strip the elections results columns and only keep the year \(\times\) county \(\times\) precinct and geometry columns, and then to perform filters on the long tabular data first, finally joining back onto the geometry record.