# install.packages(c("tidycensus", "sf", "tigris"))
library(tidycensus)
library(tigris)
## To enable caching of data, set `options(tigris_use_cache = TRUE)`
## in your R script or .Rprofile.
library(sf)
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(ggplot2)
library(gridExtra)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:gridExtra':
##
## combine
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
sf
and sp
packagesNote: The following text was one of the first published for spatial
data analysis in R
, and relied on the sp
package.
Bivand, R. S., Pebesma, E. J., & Gomez-Rubio, V. (2008). Applied spatial data analysis with R. Springer.
Two years ago, many parts of sp
were be deprecated, and
folks moved to sf
. Bivand and Pebesma have published an
updated eBook here: https://r-spatial.org/book/
sf
cheetsheet here: https://github.com/rstudio/cheatsheets/blob/main/sf.pdf
The online book Geocomputation in
R by Robin Lovelace, Jakob Nowosad, and Jannes Muenchow, covers the
sf
package as well as the terra
,
spData
, and spDataLarge
packages.
The raster
package is helpful for raster or “continuous”
space data.
Packages such as tmap
, leaflet
, and
maptiles
can enhance your spatial data visualization
capabilities, particularly for adding contextual topographic
information, integration with google maps, and the creating of dynamic
map figure files.
Packages such as qgisprocess
facilitate an interaction
between R and other more standard point-and-click GIS software such as
QGIS.
Finally, Python packages such as shapely
(interface to
GEOS), fiona
(interface to GDAL), pyproj
, and
mapclassify
exist for GIS operations and spatial data
analysis.
R
ObjectsSpatial data can come in many forms, which are generalized into the following categories:
sf
handles all of these fairly well, but in
sp
each type of data was given a separate object type in
R
(like the difference between a matrix
and a
data.frame
).
tidycensus
and PolygonsR
Package by Kyle
Walker.
# vignette("tidycensus")
## Load API key ####
source(paste0(my_dir, "CensusAPIKey.R"))
census_api_key(myKey)
## To install your API key for use in future sessions, run this function with `install = TRUE`.
### Option to cache tidycensus shapefiles ####
options(tigris_use_cache = TRUE)
In tidycensus
variables can be identified in the output
of the load_variables()
function by their
name
, label
, and concept
. A
concept
such as SEX BY AGE
below is shared by
many variables each of which are uniquely identified by an alphanumeric
string, or their name
. Variables are also uniquely
identified by the combination of their label
and their
concept
.
In the example below, all variables in the 2021 ACS within the
SEX BY AGE
concept
have a name
that begins with B01001
. These variables contain estimates
and margins of error for population counts by sex (Male, Female) and age
categories (0-5, 5-9, 10-14, 15-17, 18-19, 20, 21, 22-24, 25-29, 30-34,
35-39, 40-44, 45-49, 50-54, 55-59, 60-61, 62-64, 65-66, 67-69, 70-74,
75-79, 80-84, and 85+.) Totals are provided for each sex and the
population overall (e.g. B01001_001
and
B01001_002
below.)
load_variables(year = 2021, dataset = "acs1") %>%
filter(concept == "SEX BY AGE") %>% head()
## # A tibble: 6 × 3
## name label concept
## <chr> <chr> <chr>
## 1 B01001_001 Estimate!!Total: SEX BY AGE
## 2 B01001_002 Estimate!!Total:!!Male: SEX BY AGE
## 3 B01001_003 Estimate!!Total:!!Male:!!Under 5 years SEX BY AGE
## 4 B01001_004 Estimate!!Total:!!Male:!!5 to 9 years SEX BY AGE
## 5 B01001_005 Estimate!!Total:!!Male:!!10 to 14 years SEX BY AGE
## 6 B01001_006 Estimate!!Total:!!Male:!!15 to 17 years SEX BY AGE
A variable’s name
is passed as a character string to the
variables
argument of the get_acs()
or
get_decennial()
functions,
e.g. variables = "B01001_001"
. A character vector can be
passed if you would like to select more than one variable in the same
API query,
e.g. variables = c("B01001_001", "B01001_002")
.
There is also a table
argument that allows you to select
all variables from the same concept
. For example, to select
all variables from the SEX BY AGE
concept, one would
specify table = "B01001"
and not specify anything for the
variable
argument.
If you are not sure which variable you want, it’s
helpful to save the output of the load_variables()
function
for searching and filtering concepts and labels.
There are only 6 unique concepts in the 2020 decennial census, so looking through the variables by hand is possible.
dec_vars_2020 <- load_variables(year = 2020, dataset = "pl")
## Unique 2020 decennial concepts
unique(dec_vars_2020$concept)
## [1] "OCCUPANCY STATUS"
## [2] "RACE"
## [3] "HISPANIC OR LATINO, AND NOT HISPANIC OR LATINO BY RACE"
## [4] "RACE FOR THE POPULATION 18 YEARS AND OVER"
## [5] "HISPANIC OR LATINO, AND NOT HISPANIC OR LATINO BY RACE FOR THE POPULATION 18 YEARS AND OVER"
## [6] "GROUP QUARTERS POPULATION BY MAJOR GROUP QUARTERS TYPE"
However, there are 1,200 unique variable concepts in the 2021 ACS data. This cannot be easily searched through by hand.
acs_vars_2021 <- load_variables(year = 2021, dataset = "acs1")
## Number of unique ACS 2021 concepts
length(unique(acs_vars_2021$concept))
## [1] 1200
To find a variable of interest in the ACS, you could begin by using code below to scroll through some of the concepts by hand and get an idea of what types of words are used to describe variable concepts.
acs_vars_2021 %>%
select(concept) %>%
unique() %>% View()
Once you know some keywords that may be a part of your concept of interest, you can use regular expression and string manipulation functions to filter down the concepts. For example, scrolling through the above output will show you that migration questions almost all contain the phrase “GEOGRAPHICAL MOBILITY”.
acs_vars_2021 %>%
## Filter down to concepts containing the string
## "GEOGRAPHICAL MOBILITY"
filter(grepl("GEOGRAPHICAL MOBILITY", concept)) %>%
select(concept) %>%
## Remeber concepts cover many variables,
## so we just need to see the unique values to find
## the one we want
unique()
## # A tibble: 86 × 1
## concept
## <chr>
## 1 GEOGRAPHICAL MOBILITY IN THE PAST YEAR BY AGE FOR CURRENT RESIDENCE IN PUERT…
## 2 GEOGRAPHICAL MOBILITY IN THE PAST YEAR BY AGE FOR CURRENT RESIDENCE IN THE U…
## 3 MEDIAN AGE BY GEOGRAPHICAL MOBILITY IN THE PAST YEAR FOR CURRENT RESIDENCE I…
## 4 MEDIAN AGE BY GEOGRAPHICAL MOBILITY IN THE PAST YEAR FOR CURRENT RESIDENCE I…
## 5 GEOGRAPHICAL MOBILITY IN THE PAST YEAR BY SEX FOR CURRENT RESIDENCE IN PUERT…
## 6 GEOGRAPHICAL MOBILITY IN THE PAST YEAR BY SEX FOR CURRENT RESIDENCE IN THE U…
## 7 GEOGRAPHICAL MOBILITY IN THE PAST YEAR (WHITE ALONE) FOR CURRENT RESIDENCE I…
## 8 GEOGRAPHICAL MOBILITY IN THE PAST YEAR (WHITE ALONE) FOR CURRENT RESIDENCE I…
## 9 GEOGRAPHICAL MOBILITY IN THE PAST YEAR (BLACK OR AFRICAN AMERICAN ALONE) FOR…
## 10 GEOGRAPHICAL MOBILITY IN THE PAST YEAR (BLACK OR AFRICAN AMERICAN ALONE) FOR…
## # ℹ 76 more rows
If we know we are interested in breakdowns by age, we might try the following:
acs_vars_2021 %>%
## Filter down to concepts containing the string
## "GEOGRAPHICAL MOBILITY" AND "AGE"
filter(grepl("GEOGRAPHICAL MOBILITY", concept) &
grepl("AGE", concept)) %>%
select(concept) %>%
## Remeber concepts cover many variables,
## so we just need to see the unique values to find
## the one we want
unique()
## # A tibble: 8 × 1
## concept
## <chr>
## 1 GEOGRAPHICAL MOBILITY IN THE PAST YEAR BY AGE FOR CURRENT RESIDENCE IN PUERTO…
## 2 GEOGRAPHICAL MOBILITY IN THE PAST YEAR BY AGE FOR CURRENT RESIDENCE IN THE UN…
## 3 MEDIAN AGE BY GEOGRAPHICAL MOBILITY IN THE PAST YEAR FOR CURRENT RESIDENCE IN…
## 4 MEDIAN AGE BY GEOGRAPHICAL MOBILITY IN THE PAST YEAR FOR CURRENT RESIDENCE IN…
## 5 GEOGRAPHICAL MOBILITY IN THE PAST YEAR BY AGE FOR RESIDENCE 1 YEAR AGO IN PUE…
## 6 GEOGRAPHICAL MOBILITY IN THE PAST YEAR BY AGE FOR RESIDENCE 1 YEAR AGO IN THE…
## 7 MEDIAN AGE BY GEOGRAPHICAL MOBILITY IN THE PAST YEAR FOR RESIDENCE 1 YEAR AGO…
## 8 MEDIAN AGE BY GEOGRAPHICAL MOBILITY IN THE PAST YEAR FOR RESIDENCE 1 YEAR AGO…
geometry = FALSE
.geography
argument has different options for
specifications based on what year and data type you are requesting.kingco_pop <- get_decennial(geography = "tract",
# variables = "P1_001N",
table = "P1",
state = "WA",
county = "King",
year = 2020,
sumfile = "pl",
geometry = TRUE)
## Getting data from the 2020 decennial Census
## Using the PL 94-171 Redistricting Data Summary File
## Using the PL 94-171 Redistricting Data Summary File
## Note: 2020 decennial Census data use differential privacy, a technique that
## introduces errors into data to preserve respondent confidentiality.
## ℹ Small counts should be interpreted with caution.
## ℹ See https://www.census.gov/library/fact-sheets/2021/protecting-the-confidentiality-of-the-2020-census-redistricting-data.html for additional guidance.
## This message is displayed once per session.
summary(kingco_pop)
## GEOID NAME variable value
## Length:35145 Length:35145 Length:35145 Min. : 0.0
## Class :character Class :character Class :character 1st Qu.: 0.0
## Mode :character Mode :character Mode :character Median : 0.0
## Mean : 200.4
## 3rd Qu.: 8.0
## Max. :8837.0
## geometry
## MULTIPOLYGON :35145
## epsg:4269 : 0
## +proj=long...: 0
##
##
##
str(kingco_pop)
## sf [35,145 × 5] (S3: sf/tbl_df/tbl/data.frame)
## $ GEOID : chr [1:35145] "53033024400" "53033024400" "53033024400" "53033024400" ...
## $ NAME : chr [1:35145] "Census Tract 244, King County, Washington" "Census Tract 244, King County, Washington" "Census Tract 244, King County, Washington" "Census Tract 244, King County, Washington" ...
## $ variable: chr [1:35145] "P1_001N" "P1_002N" "P1_003N" "P1_004N" ...
## $ value : num [1:35145] 3053 2722 1891 53 9 ...
## $ geometry:sfc_MULTIPOLYGON of length 35145; first list element: List of 1
## ..$ :List of 1
## .. ..$ : num [1:31, 1:2] -122 -122 -122 -122 -122 ...
## ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA
## ..- attr(*, "names")= chr [1:4] "GEOID" "NAME" "variable" "value"
unique(kingco_pop$variable)
## [1] "P1_001N" "P1_002N" "P1_003N" "P1_004N" "P1_005N" "P1_006N" "P1_007N"
## [8] "P1_008N" "P1_009N" "P1_010N" "P1_011N" "P1_012N" "P1_013N" "P1_014N"
## [15] "P1_015N" "P1_016N" "P1_017N" "P1_018N" "P1_019N" "P1_020N" "P1_021N"
## [22] "P1_022N" "P1_023N" "P1_024N" "P1_025N" "P1_026N" "P1_027N" "P1_028N"
## [29] "P1_029N" "P1_030N" "P1_031N" "P1_032N" "P1_033N" "P1_034N" "P1_035N"
## [36] "P1_036N" "P1_037N" "P1_038N" "P1_039N" "P1_040N" "P1_041N" "P1_042N"
## [43] "P1_043N" "P1_044N" "P1_045N" "P1_046N" "P1_047N" "P1_048N" "P1_049N"
## [50] "P1_050N" "P1_051N" "P1_052N" "P1_053N" "P1_054N" "P1_055N" "P1_056N"
## [57] "P1_057N" "P1_058N" "P1_059N" "P1_060N" "P1_061N" "P1_062N" "P1_063N"
## [64] "P1_064N" "P1_065N" "P1_066N" "P1_067N" "P1_068N" "P1_069N" "P1_070N"
## [71] "P1_071N"
GEOID
sThe GEOID
column of output includes the standard Census
numeric code for a particular geography. States and counties are
determined by their Federal
Information Processing Series(FIPS) codes.
The state code is 2 digits long, Washington’s FIPS
code is 53, and we see each GEOID
above begins with those
digits.
The county code within each state is 3 digits long.
Adams County’s unique code within Washington is 001. So, the full
GEOID
for Adams County, Washington is 53001.
The tract code that uniquely identifies a Census
tract within a county is 6 digits long. The resulting GEOID
for a tract in Adams County, Washington is then 53001TTTTTT, where
TTTTTT represents a 6-digit tract code.
The block group code that uniquely identifies each
Census block group within a certain Census tract, TTTTTT, within Adams
County, Washington is 1 digit long. The resulting GEOID
for
a block group within tract TTTTTT in Adams County, Washington is
53001TTTTTTG, where G represents a 1 digit tract code.
The block code that uniquely identifies each Census
block within a block group, G, within a certain Census tract, TTTTTT,
within Adams County, Washington is 3 digits long. The resulting
GEOID
for a block group within tract TTTTTT and block group
G in Adams County, Washington is 53001TTTTTTGBBB, where BBB represents a
3 digit tract code.
Now that we’ve walked through querying the Census API for the
decennial Census and the ACS via tidycensus
we need to
cover some basics to make this data more usable. First, and foremost, no
one will ever be able to remember what every alphanumeric Census
variable name
is by heart. But, if we use the output of
load_variables()
coupled with some data-wrangling, we can
get neat string labels in no time!
Step 1: Save the output of
load_variables()
dec_vars_2020 <- load_variables(year = 2020, dataset = "pl")
Step 2: Join load_variables()
output with
get_*()
output Below, we join
dec_vars_2020
onto kingco_pop
by the
respective variable name column in each dataset. This still leaves the
pesky issue of the “!!” and “:” in the label strings.
kingco_pop <- kingco_pop %>%
left_join(dec_vars_2020 %>%
select(name, label),
by = c("variable" = "name")) %>%
relocate("label", .after = "variable")
head(kingco_pop)
## Simple feature collection with 6 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -122.2305 ymin: 47.57434 xmax: -122.2014 ymax: 47.58905
## Geodetic CRS: NAD83
## # A tibble: 6 × 6
## GEOID NAME variable label value geometry
## <chr> <chr> <chr> <chr> <dbl> <MULTIPOLYGON [°]>
## 1 53033024400 Census Tract 244, … P1_001N " !!… 3053 (((-122.2299 47.58662, -…
## 2 53033024400 Census Tract 244, … P1_002N " !!… 2722 (((-122.2299 47.58662, -…
## 3 53033024400 Census Tract 244, … P1_003N " !!… 1891 (((-122.2299 47.58662, -…
## 4 53033024400 Census Tract 244, … P1_004N " !!… 53 (((-122.2299 47.58662, -…
## 5 53033024400 Census Tract 244, … P1_005N " !!… 9 (((-122.2299 47.58662, -…
## 6 53033024400 Census Tract 244, … P1_006N " !!… 711 (((-122.2299 47.58662, -…
Step 3: Clean up the variable labels How you want to
do this will vary for each concept, this specific example is appropriate
for the 2020 decennial “RACE” concept. The result of this code is an
extra race
column added to the data making it easier to
plot and analyze.
kingco_pop <- kingco_pop %>%
## Replace the starting "!!Total"
## with an empty string everywhere
mutate(label = gsub("!!Total", "", label),
## Replace all remaining !!
label = gsub("!!", "", label)) %>%
## The only special characters left now are the :
## between sex and age groups
## we will split the string in each row on the :
## and retain the first element for sex and the second for label
group_by(variable) %>%
mutate(race_cat = unlist(strsplit(label, ":"))[2],
race_alone = unlist(strsplit(label, ":"))[3],
race_two = unlist(strsplit(label, ":"))[4]) %>%
ungroup() %>%
## Fill in "Total" strings where sex or age are missing
mutate(race_cat = ifelse(race_cat == " ", "Total", race_cat),
race_alone = ifelse(race_alone == " ", "Total", race_alone),
race_two = ifelse(race_two == " ", "Total", race_two)) %>%
relocate(c("race_cat", "race_alone", "race_two"), .after = "label")
## Check our work
kingco_pop %>%
select(variable, label, contains("race_")) %>%
head()
## Simple feature collection with 6 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -122.2305 ymin: 47.57434 xmax: -122.2014 ymax: 47.58905
## Geodetic CRS: NAD83
## # A tibble: 6 × 6
## variable label race_cat race_alone race_two geometry
## <chr> <chr> <chr> <chr> <chr> <MULTIPOLYGON [°]>
## 1 P1_001N " :" Total Total Total (((-122.2299 47.58662, -…
## 2 P1_002N " :Population… Populat… Total Populat… (((-122.2299 47.58662, -…
## 3 P1_003N " :Population… Populat… White alo… Total (((-122.2299 47.58662, -…
## 4 P1_004N " :Population… Populat… Black or … Total (((-122.2299 47.58662, -…
## 5 P1_005N " :Population… Populat… American … Total (((-122.2299 47.58662, -…
## 6 P1_006N " :Population… Populat… Asian alo… Total (((-122.2299 47.58662, -…
ggplot2
Any plot of spatial data in ggplot2
will require a
combination of ggplot()
+ geom_sf()
.
The fill
argument inside the aesthetic function
(aes()
) specifies the variable whose numeric or categorical
information will determine each polygon’s color.
kingco_pop %>%
## We have many vars in long form data,
## i.e. each polygon has multiple rows in the data,
## let's filter down to total population,
## i.e. race_cat == "Total"
filter(race_cat == "Total") %>%
ggplot() +
geom_sf(aes(fill = value))
Additional arguments can improve visualizations. * size
adjusts lines used to draw polygon boundaries *
scale_fill_viridis_c
is a non-default color palette for
continuous variables
kingco_pop %>%
filter(race_cat == "Total") %>%
ggplot() +
## size makes polygon lines smaller
geom_sf(aes(fill = value), size = .25) +
## Nice palette, and reverse direction
## so light is small, dark is large
scale_fill_viridis_c(direction = -1,
name = "Population") +
ggtitle("King County Census Tracts (2020)") +
## git rid of long lat axes
theme_void()
The facet_wrap
function tells ggplot
to
make a multi-panel plot according to the unique values of the variables
in the facet_wrap
formula.
By default, facet_wrap
will put all plots on the same
legend.
kingco_pop %>%
## Filter down to Population of Single race + those with two or more
filter((race_cat == "Population of one race" & race_alone != "Total") |
(race_cat == "Population of two or more races" &
race_alone == "Total")) %>%
## Make plot-ready labels
mutate(Race = case_when(grepl("two or", race_cat) ~ "2+",
grepl("Black", race_alone) ~ "Black",
grepl("Alaska", race_alone) ~ "AI/AN",
grepl("Hawaiian", race_alone) ~ "NHOPI",
grepl("Some", race_alone) ~ "Other",
TRUE ~ gsub(" alone", "", race_alone)),
## Create a factor variable to control order of panels
## Otherwise, will be alphabetical
Race = factor(Race, levels = c("AI/AN", "Asian", "Black",
"NHOPI", "Other", "2+", "White"))) %>%
ggplot() +
geom_sf(aes(fill = value), size = .25) +
scale_fill_viridis_c(direction = -1,
name = "Population") +
facet_wrap(~Race) +
ggtitle("King County Census Tracts (2020)") +
theme_void()
kingco_pop_wide <- kingco_pop %>%
## Filter down to Population of Single race + those with two or more
filter((race_cat == "Population of one race" & race_alone != "Total") |
(race_cat == "Population of two or more races" &
race_alone == "Total")) %>%
## Make plot-ready labels
mutate(Race = case_when(grepl("two or", race_cat) ~ "2+",
grepl("Black", race_alone) ~ "Black",
grepl("Alaska", race_alone) ~ "AI/AN",
grepl("Hawaiian", race_alone) ~ "NHOPI",
grepl("Some", race_alone) ~ "Other",
TRUE ~ gsub(" alone", "", race_alone)),
## Create a factor variable to control order of panels
## Otherwise, will be alphabetical
Race = factor(Race, levels = c("AI/AN", "Asian", "Black",
"NHOPI", "Other", "2+", "White"))) %>%
## Create one column per Race variable with pop size inside
pivot_wider(id_cols = c("GEOID", "NAME", "geometry"),
names_from = "Race",
values_from = "value")
## What is the range of values for all tracts?
kingco_pop_wide %>%
st_drop_geometry() %>%
select(-GEOID, -NAME) %>%
range()
## [1] 0 6272
## Check quintiles
kingco_pop_wide %>%
st_drop_geometry() %>%
select(-GEOID, -NAME) %>%
as.matrix() %>%
quantile(probs = seq(0, 1, .1))
## 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
## 0.0 13.0 32.0 68.0 129.0 272.0 414.0 592.8 949.2 2136.4 6272.0
## White alone
pop_white <- kingco_pop_wide %>%
## Drop water tracts (tigris fn)
erase_water() %>%
ggplot() +
geom_sf(aes(fill = White), size = 0.25) +
scale_fill_viridis_c(direction = -1,
name = "Population",
breaks = c(0, 50,
100, 250,
500, 1000,
2000, 4000,
6000, 6300),
limits = c(0, 6300)) +
ggtitle("White alone") +
theme_void()
## Fetching area water data for your dataset's location...
## Erasing water area...
## If this is slow, try a larger area threshold value.
## White alone
pop_asian <- kingco_pop_wide %>%
## Drop water tracts (tigris fn)
## use larger area_threshold arg to ignore smaller bodies of water
erase_water(area_threshold = 0.99) %>%
ggplot() +
geom_sf(aes(fill = Asian), size = 0.25) +
scale_fill_viridis_c(direction = -1,
name = "Population",
breaks = c(0, 50,
100, 250,
500, 1000,
2000, 4000,
6000, 6300),
limits = c(0, 6300)) +
ggtitle("Asian alone") +
theme_void()
## Fetching area water data for your dataset's location...
## Erasing water area...
## If this is slow, try a larger area threshold value.
## 2 or more
pop_twoplus <- kingco_pop_wide %>%
erase_water(area_threshold = 0.99) %>%
ggplot() +
geom_sf(aes(fill = `2+`), size = 0.25) +
scale_fill_viridis_c(direction = -1,
name = "Population",
breaks = c(0, 50,
100, 250,
500, 1000,
2000, 4000,
6000, 6300),
limits = c(0, 6300)) +
ggtitle("Two or more races") +
theme_void()
## Fetching area water data for your dataset's location...
## Erasing water area...
## If this is slow, try a larger area threshold value.
grid.arrange(pop_white, pop_asian, pop_twoplus, nrow = 2)
This may be a personal preference, but unless you are treating space continuously, discrete legends are easier for a reader or audience member to digest.
If you are familiar with RColorBrewer
, there are a suite
of functions designed to work with ggplot
that allow you to
call upon their palettes.
## To see RColorBrewer palettes
# library(RColorBrewer)
# display.brewer.all()
## Using a 3 color palette, similar to viridis
kingco_pop_racealone <- kingco_pop %>%
## Filter down to Population of Single race + those with two or more
filter((race_cat == "Population of one race" & race_alone != "Total") |
(race_cat == "Population of two or more races" &
race_alone == "Total")) %>%
## Make plot-ready labels
mutate(Race = case_when(grepl("two or", race_cat) ~ "2+",
grepl("Black", race_alone) ~ "Black",
grepl("Alaska", race_alone) ~ "AI/AN",
grepl("Hawaiian", race_alone) ~ "NHOPI",
grepl("Some", race_alone) ~ "Other",
TRUE ~ gsub(" alone", "", race_alone)),
## Create a factor variable to control order of panels
## Otherwise, will be alphabetical
Race = factor(Race, levels = c("AI/AN", "Asian", "Black",
"NHOPI", "Other", "2+", "White"))) %>%
## Remove water
erase_water(area_threshold = 0.99)
## Fetching area water data for your dataset's location...
## Erasing water area...
## If this is slow, try a larger area threshold value.
kingco_pop_racealone %>%
ggplot() +
geom_sf(aes(fill = value), color = "grey80", size = .05) +
scale_fill_fermenter(direction = 1, palette = "YlGnBu",
name = "Population",
breaks = c(0, 50,
100, 250,
500, 1000,
2000, 4000,
6000, 6300),
limits = c(0, 6300)) +
facet_wrap(~Race) +
ggtitle("King County Census Tracts (2020)") +
theme_void()
pop_white <- kingco_pop_wide %>%
erase_water(area_threshold = 0.99) %>%
ggplot() +
geom_sf(aes(fill = White), size = 0.25, show.legend = FALSE) +
scale_fill_fermenter(direction = 1, palette = "YlGnBu",
name = "Population",
breaks = c(0, 50,
100, 250,
500, 1000,
2000, 4000,
6000, 6300),
limits = c(0, 6300)) +
ggtitle("White alone") +
theme_void()
## Fetching area water data for your dataset's location...
## Erasing water area...
## If this is slow, try a larger area threshold value.
pop_asian <- kingco_pop_wide %>%
erase_water(area_threshold = 0.99) %>%
ggplot() +
geom_sf(aes(fill = Asian), size = 0.25) +
scale_fill_fermenter(direction = 1, palette = "YlGnBu",
name = "Population",
breaks = c(0, 50,
100, 250,
500, 1000,
2000, 4000,
6000, 6300),
limits = c(0, 6300)) +
ggtitle("Asian alone") +
theme_void()
## Fetching area water data for your dataset's location...
## Erasing water area...
## If this is slow, try a larger area threshold value.
pop_twoplus <- kingco_pop_wide %>%
erase_water(area_threshold = 0.99) %>%
ggplot() +
geom_sf(aes(fill = `2+`), size = 0.25, show.legend = FALSE) +
scale_fill_fermenter(direction = 1, palette = "YlGnBu",
name = "Population",
breaks = c(0, 50,
100, 250,
500, 1000,
2000, 4000,
6000, 6300),
limits = c(0, 6300)) +
ggtitle("Two or more") +
theme_void()
## Fetching area water data for your dataset's location...
## Erasing water area...
## If this is slow, try a larger area threshold value.
grid.arrange(pop_white, pop_asian, pop_twoplus, nrow = 2)
Now, for the first time, we will load a shapefile using the
read_sf
function in R.
.shp
contains the geography of each shape. The second file
shx
is an index file which contains record offsets. The
third file .dbf
contains feature attributes with one record
per feature.Download Sound Transit Link link rail station point location data: https://www.soundtransit.org/help-contacts/business-information/open-transit-data-otd/otd-downloads.
You’ll notice that the folder, once unzipped, contains many spatial layers. We’re going to load the
link_stns <- read_sf(dsn = "../Data/STPublicData",
layer = "LINKStations")
head(link_stns)
## Simple feature collection with 6 features and 6 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 1269619 ymin: 215537.9 xmax: 1271843 ymax: 226693.4
## Projected CRS: NAD83 / Washington North (ftUS)
## # A tibble: 6 × 7
## LINK_TYPE NAME STATION SEGMENT STATUS STATUSNOTE geometry
## <int> <chr> <chr> <chr> <chr> <chr> <POINT [US_survey_foot]>
## 1 1 Westlak… Westla… C COMPL… <NA> (1269619 226693.4)
## 2 1 Univers… Univer… C COMPL… <NA> (1269773 225331.4)
## 3 1 Pioneer… Pionee… C COMPL… <NA> (1270921 223393.7)
## 4 1 Interna… Intern… C COMPL… <NA> (1271685 221842.8)
## 5 1 Stadium… Stadium C COMPL… <NA> (1271843 219195.9)
## 6 1 SODO St… SODO C COMPL… <NA> (1271706 215537.9)
link_lines <- read_sf(dsn = "../Data/STPublicData",
layer = "LINKLine")
head(link_lines)
## Simple feature collection with 6 features and 6 fields
## Geometry type: MULTILINESTRING
## Dimension: XY
## Bounding box: xmin: 1241431 ymin: 91328.47 xmax: 1285395 ymax: 258631.2
## Projected CRS: NAD83 / Washington North (ftUS)
## # A tibble: 6 × 7
## OBJECTID LINK_TYPE DESCRIPTIO STATUS Shape_Leng LINE
## <int> <int> <chr> <chr> <dbl> <chr>
## 1 68 1 Central Link COMPLETE 73476. 1
## 2 101 4 Airport Link COMPLETE 9029. 1
## 3 109 2 Tacoma Link COMPLETE 9382. T
## 4 113 6 University Link COMPLETE 16875. 1
## 5 130 3 Northgate Link COMPLETE 19939. 1
## 6 167 5 S 200th Extension COMPLETE 8998. 1
## # ℹ 1 more variable: geometry <MULTILINESTRING [US_survey_foot]>
st_crs
function will print out the coordinate
reference system for the object passed to the functionst_transform
function allows you to transform the
CRS of an sf
object. You can either pass the CRS numeric
code or the output of an st_crs
function to the
crs
argument.st_crs(link_stns)
## Coordinate Reference System:
## User input: NAD83 / Washington North (ftUS)
## wkt:
## PROJCRS["NAD83 / Washington North (ftUS)",
## BASEGEOGCRS["NAD83",
## DATUM["North American Datum 1983",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]],
## ID["EPSG",6269]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["Degree",0.0174532925199433]]],
## CONVERSION["unnamed",
## METHOD["Lambert Conic Conformal (2SP)",
## ID["EPSG",9802]],
## PARAMETER["Latitude of false origin",47,
## ANGLEUNIT["Degree",0.0174532925199433],
## ID["EPSG",8821]],
## PARAMETER["Longitude of false origin",-120.833333333333,
## ANGLEUNIT["Degree",0.0174532925199433],
## ID["EPSG",8822]],
## PARAMETER["Latitude of 1st standard parallel",47.5,
## ANGLEUNIT["Degree",0.0174532925199433],
## ID["EPSG",8823]],
## PARAMETER["Latitude of 2nd standard parallel",48.7333333333333,
## ANGLEUNIT["Degree",0.0174532925199433],
## ID["EPSG",8824]],
## PARAMETER["Easting at false origin",1640416.66666667,
## LENGTHUNIT["US survey foot",0.304800609601219],
## ID["EPSG",8826]],
## PARAMETER["Northing at false origin",0,
## LENGTHUNIT["US survey foot",0.304800609601219],
## ID["EPSG",8827]]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["US survey foot",0.304800609601219,
## ID["EPSG",9003]]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["US survey foot",0.304800609601219,
## ID["EPSG",9003]]]]
st_crs(kingco_pop)
## Coordinate Reference System:
## User input: NAD83
## wkt:
## GEOGCRS["NAD83",
## DATUM["North American Datum 1983",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4269]]
## Transform the link stations to the Census CRS
link_stns <- link_stns %>%
st_transform(crs = st_crs(kingco_pop))
st_crs(link_stns)
## Coordinate Reference System:
## User input: NAD83
## wkt:
## GEOGCRS["NAD83",
## DATUM["North American Datum 1983",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4269]]
## Transform the link lines to the Census CRS
link_lines <- link_lines %>%
st_transform(crs = st_crs(kingco_pop))
st_crs(link_lines)
## Coordinate Reference System:
## User input: NAD83
## wkt:
## GEOGCRS["NAD83",
## DATUM["North American Datum 1983",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4269]]
ggplot2
Let’s add the location of LINK stations to our existing plot of King County’s 2020 decennial tract-level Asian alone population
kingco_pop_wide %>%
erase_water(area_threshold = 0.99) %>%
ggplot() +
geom_sf(aes(fill = Asian), size = 0.25) +
scale_fill_fermenter(direction = 1,
palette = "Blues",
name = "Population",
breaks = c(0, 50,
100, 250,
500, 1000,
2000, 4000,
6000, 6300),
limits = c(0, 6300)) +
## Need to specify the data argument when bringing in a second dataset
geom_sf(data = link_stns, color = "white") +
ggtitle("Asian alone Population (2020 Census)",
subtitle = "Overlayed w/ LINK Stations") +
theme_void()
## Fetching area water data for your dataset's location...
## Erasing water area...
## If this is slow, try a larger area threshold value.
Now, Link lines.
names(link_lines)
## [1] "OBJECTID" "LINK_TYPE" "DESCRIPTIO" "STATUS" "Shape_Leng"
## [6] "LINE" "geometry"
link_lines %>%
st_drop_geometry() %>%
group_by(LINE) %>%
tally()
## # A tibble: 4 × 2
## LINE n
## <chr> <int>
## 1 1 6
## 2 2 1
## 3 T 2
## 4 <NA> 4
kingco_pop_wide %>%
erase_water(area_threshold = 0.99) %>%
ggplot() +
geom_sf(aes(fill = Asian), size = 0.05, color = "white") +
scale_fill_fermenter(direction = 1,
palette = "Blues",
name = "Population",
breaks = c(0, 50,
100, 250,
500, 1000,
2000, 4000,
6000, 6300),
limits = c(0, 6300)) +
## Need to specify the data argument when bringing in a second dataset
geom_sf(data = link_lines %>%
filter(LINE %in% 1:2) %>%
mutate(LINE = as.factor(LINE)), aes(color = LINE)) +
scale_color_manual(name = "LINK Line",
values = c("1" = "deeppink", "2" = "orange")) +
ggtitle("King County: Asian alone Population (2020)",
subtitle = "Overlayed w/ LINK Lines") +
theme_void()
## Fetching area water data for your dataset's location...
## Erasing water area...
## If this is slow, try a larger area threshold value.
## Combining & comparing spatial objects in
sf
sf
functions st_intersects
and
st_join
can be used to understand what parts of one spatial
object overlap with anotherst_*()
functions that make other
geographic comparisons between two objects: st_contains
,
st_contains_properly
, st_covered_by
,
st_covers
st_crosses
,
st_disjoint
, st_equals_exact
,
st_equals
, st_is_within_distance
,
st_nearest_feature
, st_overlaps
,
st_touches
, st_within
. See the Figure 5.8 from
the Lovelace textbook for a visual representation of some function’s
purpose.x
(the first object in the pipe or first argument) and y
(object within the st_*
function in the pipe or the second
argument). The two objects x
and y
must have identical coordinate reference systems
(CRS)This example concerns a points (LINK Light rail
stations) to polygon (Census tract) operation, but
other assignment types like polygon-to-polygon or line-to-polygon are
possible. The latter types are slightly more complicated as one must
make decisions about what to do with corresponding area-level data when,
for example, the x
polygon overlaps with two y
polygons.
The easiest way to two assign points-to-polygons is to use the
st_join
function, as it will automatically merge any
polygon (y
) affiliated data onto the point object
(x
).
Note: In this specific case, kingco_pop
has 71 rows per Census tract GEOID
. It is my personal
preference to think about which variables from y
I
want appended to x
, and filter y
before or
during the call to st_join
. However, as we see below the
largest = TRUE
argument is specified, which tells
st_join
to take the data for the feature in y
with the largest overlap with x
. When the size of overlap
is equal (as for many rows attached to the same polygon), it will return
the first row. If largest = FALSE
, the returned object will
have a row for every feature in y
that each feature in
x
lies within or intersects.
## Assign link station to 2020 Census tracts
## Pull ALL columns from *first* appropriate row of kingco_pop onto link_stns
link_join <- st_join(x = link_stns, y = kingco_pop,
largest = TRUE,
suffix = c("_link", "_pop"))
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
head(link_join)
## Simple feature collection with 6 features and 14 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -122.3367 ymin: 47.58108 xmax: -122.3271 ymax: 47.61154
## Geodetic CRS: NAD83
## # A tibble: 6 × 15
## LINK_TYPE NAME_link STATION SEGMENT STATUS STATUSNOTE
## <int> <chr> <chr> <chr> <chr> <chr>
## 1 1 Westlake Station Westlake C COMPL… <NA>
## 2 1 University Street Station University… C COMPL… <NA>
## 3 1 Pioneer Square Station Pioneer Sq… C COMPL… <NA>
## 4 1 International District Station Internatio… C COMPL… <NA>
## 5 1 Stadium Station Stadium C COMPL… <NA>
## 6 1 SODO Station SODO C COMPL… <NA>
## # ℹ 9 more variables: geometry <POINT [°]>, GEOID <chr>, NAME_pop <chr>,
## # variable <chr>, label <chr>, race_cat <chr>, race_alone <chr>,
## # race_two <chr>, value <dbl>
str(link_join)
## sf [51 × 15] (S3: sf/tbl_df/tbl/data.frame)
## $ LINK_TYPE : int [1:51] 1 1 1 1 1 1 1 1 1 1 ...
## $ NAME_link : chr [1:51] "Westlake Station" "University Street Station" "Pioneer Square Station" "International District Station" ...
## $ STATION : chr [1:51] "Westlake" "University Street" "Pioneer Square" "International District / Chinatown" ...
## $ SEGMENT : chr [1:51] "C" "C" "C" "C" ...
## $ STATUS : chr [1:51] "COMPLETE" "COMPLETE" "COMPLETE" "COMPLETE" ...
## $ STATUSNOTE: chr [1:51] NA NA NA NA ...
## $ geometry :sfc_POINT of length 51; first list element: 'XY' num [1:2] -122.3 47.6
## $ GEOID : chr [1:51] "53033008102" "53033008102" "53033008102" "53033009300" ...
## $ NAME_pop : chr [1:51] "Census Tract 81.02, King County, Washington" "Census Tract 81.02, King County, Washington" "Census Tract 81.02, King County, Washington" "Census Tract 93, King County, Washington" ...
## $ variable : chr [1:51] "P1_001N" "P1_001N" "P1_001N" "P1_001N" ...
## $ label : chr [1:51] " :" " :" " :" " :" ...
## $ race_cat : chr [1:51] "Total" "Total" "Total" "Total" ...
## $ race_alone: chr [1:51] "Total" "Total" "Total" "Total" ...
## $ race_two : chr [1:51] "Total" "Total" "Total" "Total" ...
## $ value : num [1:51] 3005 3005 3005 3611 3611 ...
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
## ..- attr(*, "names")= chr [1:14] "LINK_TYPE" "NAME_link" "STATION" "SEGMENT" ...
st_intersects
requires more steps and is a more
“under-the-hood” work horse of spatial joins in R. Like
st_join
the function takes an x
and
y
argument for x
-to-y
assignment,
but returns a list of length nrow(x)
whose elements are ALL
the rows of y
each row of x
lies within or
intersects.
In our example of a points-to-polygons assignment, x
is
link_stns
and y
is kingco_pop
.
Since we have all the 2020 Census population counts for the P1 Race
table for each tract in kingco_pop
, this vector elements of
the list output by st_intersects
is either of length 71
(all variables in the P1 table) or length 0 (the LINK station doesn’t
fall in a King County Census tract boundary.)
## return list of length(link_stns)
## whose elements are vectors with the row numbers
## of kingco_pop each link stations latlong lies within
link_int <- st_intersects(link_stns, kingco_pop)
## To access list elements use double brackets [[]]
head(link_int[[1]])
## [1] 8876 8877 8878 8879 8880 8881
## Compare to join output
link_join %>% slice(1)
## Simple feature collection with 1 feature and 14 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -122.3367 ymin: 47.61154 xmax: -122.3367 ymax: 47.61154
## Geodetic CRS: NAD83
## # A tibble: 1 × 15
## LINK_TYPE NAME_link STATION SEGMENT STATUS STATUSNOTE
## <int> <chr> <chr> <chr> <chr> <chr>
## 1 1 Westlake Station Westlake C COMPLETE <NA>
## # ℹ 9 more variables: geometry <POINT [°]>, GEOID <chr>, NAME_pop <chr>,
## # variable <chr>, label <chr>, race_cat <chr>, race_alone <chr>,
## # race_two <chr>, value <dbl>
link_int[[1]]
## [1] 8876 8877 8878 8879 8880 8881 8882 8883 8884 8885 8886 8887 8888 8889 8890
## [16] 8891 8892 8893 8894 8895 8896 8897 8898 8899 8900 8901 8902 8903 8904 8905
## [31] 8906 8907 8908 8909 8910 8911 8912 8913 8914 8915 8916 8917 8918 8919 8920
## [46] 8921 8922 8923 8924 8925 8926 8927 8928 8929 8930 8931 8932 8933 8934 8935
## [61] 8936 8937 8938 8939 8940 8941 8942 8943 8944 8945 8946
## To access elements of a data.frame use single brackets [,]
## with row index before the column and column index after
## e.g. below will print the row of kingco_pop[row-number, ]
## identified in the first vector element [1]
## of the first list element [[1]]
## of the link_int list object
kingco_pop[link_int[[1]][1], ]
## Simple feature collection with 1 feature and 8 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -122.3423 ymin: 47.60171 xmax: -122.3277 ymax: 47.61268
## Geodetic CRS: NAD83
## # A tibble: 1 × 9
## GEOID NAME variable label race_cat race_alone race_two value
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 53033008102 Census Tract 81… P1_001N " :" Total Total Total 3005
## # ℹ 1 more variable: geometry <MULTIPOLYGON [°]>
## Now we need to create a vector of length nrow(link_stns)
## whose elements are **single** row-indices of the intersecting polygon
## instead of repeated for each of the 71 variables in the P1 table
link_tract_rows <- rep(NA, nrow(link_stns))
for(stn in 1:nrow(link_stns)){
## Take first match (Total Pop) for stations that fall within a tract
if(length(link_int[[stn]]) > 0){
link_tract_rows[stn] <- link_int[[stn]][1]
}else{
link_tract_rows[stn] <- NA
}
}
link_tract_rows
## [1] 8876 8876 8876 31241 31241 31241 25490 26839 4190 33655 12852 11929
## [13] 19313 6249 15905 9586 NA NA NA NA NA NA 14911 31028
## [25] 20449 21088 33158 NA NA 11645 3622 30957 4829 33229 12568 12568
## [37] 28756 8521 8521 8521 11432 11432 31312 10154 NA NA NA NA
## [49] NA NA NA
## Create new dataset combining link stations to total population of tract
link_pop_df <- link_stns
## Create tract and pop columns
link_pop_df$GEOID <- kingco_pop$GEOID[link_tract_rows]
link_pop_df$value <- kingco_pop$value[link_tract_rows]
head(link_pop_df)
## Simple feature collection with 6 features and 8 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -122.3367 ymin: 47.58108 xmax: -122.3271 ymax: 47.61154
## Geodetic CRS: NAD83
## # A tibble: 6 × 9
## LINK_TYPE NAME STATION SEGMENT STATUS STATUSNOTE geometry
## <int> <chr> <chr> <chr> <chr> <chr> <POINT [°]>
## 1 1 Westlak… Westla… C COMPL… <NA> (-122.3367 47.61154)
## 2 1 Univers… Univer… C COMPL… <NA> (-122.336 47.60782)
## 3 1 Pioneer… Pionee… C COMPL… <NA> (-122.3312 47.60257)
## 4 1 Interna… Intern… C COMPL… <NA> (-122.3279 47.59836)
## 5 1 Stadium… Stadium C COMPL… <NA> (-122.3271 47.59111)
## 6 1 SODO St… SODO C COMPL… <NA> (-122.3274 47.58108)
## # ℹ 2 more variables: GEOID <chr>, value <dbl>
names(link_pop_df)
## [1] "LINK_TYPE" "NAME" "STATION" "SEGMENT" "STATUS"
## [6] "STATUSNOTE" "geometry" "GEOID" "value"
## Create flag in kingco_pop that shows which tracts have stations in them
## unlist() turns link_int into one long vector of the unique
## rows in kingco_pop corresponding to tracts that contain a light rail station
unique_stn_tract_rows <- link_int %>% unlist() %>% unique()
kingco_pop$has_lightrail <- 0
kingco_pop$has_lightrail[unique_stn_tract_rows] <- 1
table(kingco_pop$has_lightrail, useNA = "ifany")
##
## 0 1
## 33157 1988
## Plot the tracts with a light rail
kingco_pop %>%
## Filter down to just one row per tract
filter(race_cat == "Total") %>%
erase_water(area_threshold = 0.99) %>%
mutate(has_lightrail = factor(has_lightrail, levels = 0:1,
labels = c("No", "Yes"))) %>%
ggplot() +
geom_sf(aes(fill = has_lightrail), color = "grey80", size = 0.2) +
scale_fill_manual(name = "Contains Lightrail",
values = c("No" = "white", "Yes" = "firebrick")) +
ggtitle("2020 King County Census Tracts",
subtitle = "LINK Light rail Stations") +
theme_void()
## Fetching area water data for your dataset's location...
## Erasing water area...
## If this is slow, try a larger area threshold value.
Suppose we want to identify all Census tracts within a certain distance of a LINK Light rail station. While the question seems straight forward, in GIS analysis, it requires clear definitions of what we mean by distance.
Distance between spatial features: * points-to-polygons: distance to the polygon centroid? minimum distance to the boundary? * polygon-to-polygon: centroid to centroid? are they neighbors? minimum number of polygons traversed to get from polygon x to polygon y? * point-to-line: minimum distance? maximum distance? road distance?
Type of distance: * Are we calculating distance on a globe? On a 2-dimensional Mercator (UTM) projection? * Driving distance? Shortest distance without crossing over your route?
Our example is relatively simple, and we will take a simple definition of tracts within a certain distance of a light rail station by calculating distances between station GPS locations and tract polygon centroids.
First, we need to get a spatial object with polygon centroids.
st_centroid
is one of a suite of sf
functions
the return manipulations of polygon objects. See
?st_centroid
for more information.
## st_centroid retains variable info about the tracts
kingco_pts <- st_centroid(kingco_pop)
## Warning: st_centroid assumes attributes are constant over geometries
str(kingco_pts)
## sf [35,145 × 10] (S3: sf/tbl_df/tbl/data.frame)
## $ GEOID : chr [1:35145] "53033024400" "53033024400" "53033024400" "53033024400" ...
## $ NAME : chr [1:35145] "Census Tract 244, King County, Washington" "Census Tract 244, King County, Washington" "Census Tract 244, King County, Washington" "Census Tract 244, King County, Washington" ...
## $ variable : chr [1:35145] "P1_001N" "P1_002N" "P1_003N" "P1_004N" ...
## $ label : chr [1:35145] " :" " :Population of one race:" " :Population of one race:White alone" " :Population of one race:Black or African American alone" ...
## $ race_cat : chr [1:35145] "Total" "Population of one race" "Population of one race" "Population of one race" ...
## $ race_alone : chr [1:35145] "Total" "Total" "White alone" "Black or African American alone" ...
## $ race_two : chr [1:35145] "Total" "Population of one race" "Total" "Total" ...
## $ value : num [1:35145] 3053 2722 1891 53 9 ...
## $ geometry :sfc_POINT of length 35145; first list element: 'XY' num [1:2] -122.2 47.6
## $ has_lightrail: num [1:35145] 0 0 0 0 0 0 0 0 0 0 ...
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA
## ..- attr(*, "names")= chr [1:9] "GEOID" "NAME" "variable" "label" ...
Now, we can calculate distances between kingco_pts
and
link_stns
using the st_distance
function.
tract_to_stn <- st_distance(x = kingco_pts, y = link_stns)
## Output is nrow(kingco_pts) x nrow(link_stns) matrix
## with elements that are distances in meters
head(tract_to_stn)
## Units: [m]
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 9568.917 9377.761 8861.334 8510.782 8301.559 8244.696 7073.274 6035.596
## [2,] 9568.917 9377.761 8861.334 8510.782 8301.559 8244.696 7073.274 6035.596
## [3,] 9568.917 9377.761 8861.334 8510.782 8301.559 8244.696 7073.274 6035.596
## [4,] 9568.917 9377.761 8861.334 8510.782 8301.559 8244.696 7073.274 6035.596
## [5,] 9568.917 9377.761 8861.334 8510.782 8301.559 8244.696 7073.274 6035.596
## [6,] 9568.917 9377.761 8861.334 8510.782 8301.559 8244.696 7073.274 6035.596
## [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16]
## [1,] 6116.759 6775.303 7994.486 14037.46 15914.9 12949.97 11437.25 8790.705
## [2,] 6116.759 6775.303 7994.486 14037.46 15914.9 12949.97 11437.25 8790.705
## [3,] 6116.759 6775.303 7994.486 14037.46 15914.9 12949.97 11437.25 8790.705
## [4,] 6116.759 6775.303 7994.486 14037.46 15914.9 12949.97 11437.25 8790.705
## [5,] 6116.759 6775.303 7994.486 14037.46 15914.9 12949.97 11437.25 8790.705
## [6,] 6116.759 6775.303 7994.486 14037.46 15914.9 12949.97 11437.25 8790.705
## [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## [1,] 40090.46 39800.18 40430.95 40842.37 41105.92 41359.35 10043.1 16206.18
## [2,] 40090.46 39800.18 40430.95 40842.37 41105.92 41359.35 10043.1 16206.18
## [3,] 40090.46 39800.18 40430.95 40842.37 41105.92 41359.35 10043.1 16206.18
## [4,] 40090.46 39800.18 40430.95 40842.37 41105.92 41359.35 10043.1 16206.18
## [5,] 40090.46 39800.18 40430.95 40842.37 41105.92 41359.35 10043.1 16206.18
## [6,] 40090.46 39800.18 40430.95 40842.37 41105.92 41359.35 10043.1 16206.18
## [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32]
## [1,] 18596.57 19050.6 21842.55 23834.61 26733.54 25356.13 30180.05 22056.49
## [2,] 18596.57 19050.6 21842.55 23834.61 26733.54 25356.13 30180.05 22056.49
## [3,] 18596.57 19050.6 21842.55 23834.61 26733.54 25356.13 30180.05 22056.49
## [4,] 18596.57 19050.6 21842.55 23834.61 26733.54 25356.13 30180.05 22056.49
## [5,] 18596.57 19050.6 21842.55 23834.61 26733.54 25356.13 30180.05 22056.49
## [6,] 18596.57 19050.6 21842.55 23834.61 26733.54 25356.13 30180.05 22056.49
## [,33] [,34] [,35] [,36] [,37] [,38] [,39] [,40]
## [1,] 1435.076 6612.954 2123.449 3617.767 4263.031 4832.928 5586.573 6206.654
## [2,] 1435.076 6612.954 2123.449 3617.767 4263.031 4832.928 5586.573 6206.654
## [3,] 1435.076 6612.954 2123.449 3617.767 4263.031 4832.928 5586.573 6206.654
## [4,] 1435.076 6612.954 2123.449 3617.767 4263.031 4832.928 5586.573 6206.654
## [5,] 1435.076 6612.954 2123.449 3617.767 4263.031 4832.928 5586.573 6206.654
## [6,] 1435.076 6612.954 2123.449 3617.767 4263.031 4832.928 5586.573 6206.654
## [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## [1,] 8520.565 9487.346 12521.86 12541.15 39678.95 39387.11 39229.85 41276.94
## [2,] 8520.565 9487.346 12521.86 12541.15 39678.95 39387.11 39229.85 41276.94
## [3,] 8520.565 9487.346 12521.86 12541.15 39678.95 39387.11 39229.85 41276.94
## [4,] 8520.565 9487.346 12521.86 12541.15 39678.95 39387.11 39229.85 41276.94
## [5,] 8520.565 9487.346 12521.86 12541.15 39678.95 39387.11 39229.85 41276.94
## [6,] 8520.565 9487.346 12521.86 12541.15 39678.95 39387.11 39229.85 41276.94
## [,49] [,50] [,51]
## [1,] 40599.56 40123.87 39838.37
## [2,] 40599.56 40123.87 39838.37
## [3,] 40599.56 40123.87 39838.37
## [4,] 40599.56 40123.87 39838.37
## [5,] 40599.56 40123.87 39838.37
## [6,] 40599.56 40123.87 39838.37
dim(tract_to_stn)
## [1] 35145 51
If our threshhold for distance is 5 miles, we can create a vector of
length nrow(kingco_pts)
that is a binary indicator of
whether that tract is within 5 miles of any of the
nrow(link_stns)
light rail stations.
## 1 mile = 1.609344 km
## 1 km = 1000 m
dist_thresh <- 5*1.609344*1000
## apply will apply a function row-wise (MARGIN = 1)
## or column-wise (MARGIN = 2) to a matrix
tract_within_stn <- apply(tract_to_stn, MARGIN = 1, function(row){
ifelse(sum(row < dist_thresh) >= 1, 1, 0)
})
table(tract_within_stn, useNA = "ifany")
## tract_within_stn
## 0 1 <NA>
## 7313 27761 71
## Add indicator of within 5 miles to
kingco_pop$stn_within5 <- tract_within_stn
kingco_pop %>%
## Filter down to just one row per tract
filter(race_cat == "Total") %>%
erase_water(area_threshold = 0.99) %>%
mutate(lightrail = case_when(has_lightrail == 1 ~ "Yes",
stn_within5 == 1 ~ "Within 5mi",
TRUE ~ "Neither"),
lightrail = factor(lightrail, levels = c("Yes", "Within 5mi",
"Neither"))) %>%
ggplot() +
geom_sf(aes(fill = lightrail), color = "grey80", size = 0.2) +
scale_fill_manual(name = "Contains Lightrail",
values = c("Yes" = "dodgerblue",
"Within 5mi" = alpha("deeppink", 0.65),
"Neither" = "white")) +
ggtitle("2020 King County Census Tracts",
subtitle = "LINK Light rail Stations") +
theme_void()
## Fetching area water data for your dataset's location...
## Erasing water area...
## If this is slow, try a larger area threshold value.