Chapter 4 and Chapter 5 have introduced the fundamentals of spatial data manipulation and map-making, particularly, producing single-variable thematic maps. In this chapter, we will learn to:
Under the course folder, please create a folder called “lab6”. Next, in the lab6 folder, please create two sub-folders that one is called “data” and another one is “plot”.
This chapter continues to explore spatial data sources other than US Census Bureau. We will use the 311 Service Requests in 2023 and Parks, Medians, and Parkway Trees.
311 data contains pertinent information related to inquiries and service requests for the City and County of Denver.
Please follow the steps below to download data, unzip it and move the data to the required folder.
If there you have any questions about the above-mentioned steps, please refer to Chapter 3.2.3 for detailed instructions.
Again, we would like to start a new project from scratch with a clean R Script. Please do the following steps. If you have any questions about these steps, please refer to the relevant chapters for help.
Heads-Up!
All scripts are non-copyable.
Load libraries.
### Library
library(dplyr) # data manipulation
library(tidyverse) # tidy data
library(reshape2) # one-hot encoding
library(sf) # simple features
library(geojsonsf) # read geojson
library(tmap) # mapping
library(ggplot2) # plot
library(viridis) # viz - color
library(cartogram) # map - cartogram
library(pals) # bivariate color palette
require(classInt) # classIntervals, findCols
require(lattice) # levelplot
require(grid) # viewport
Input data
# Denver neighborhood (geojson --> simple features)
sf_den_ngbr <- geojson_sf("data/Denver_neighborhood.geojson")
# Denver 311 service requests (csv -- > data frame)
df_den_311 <- read.csv("data/2023_denver_311_service_request_cleaned.csv")
# Denver Trees (around 1 minute to read the data)
sf_den_tree <- geojson_sf("data/denver_trees.geojson")
In the previous chapter, we have played with AirBnb data.
This week, we will play with 311 Service Requests in Denver. 311 is a special telephone number supported in many communities in the United States that provides access to non-emergency municipal services like reporting abandoned vehicles, noise complaints, and graffiti.
Before any analysis and visualization tasks, the first step is to learn about the data and check what are the attributes (or columns) in the data.
[Q1] In your opinion, which column in
df_den_311
is most interesting and briefly explain why.
# open data
View(df_den_311)
And the distributions of the variables in, e.g., the
Agency
column.
# contingency table of the attribute 'Agency'
tb_311_agency_freq <- table(df_den_311$Agency)
View(tb_311_agency_freq)
# plot: contingency table
barplot(tb_311_agency_freq, main="311 Requests - Agency", las=2, cex.names = 0.5)
Check data type of each column. [Q2] What is the
data type of the column Case_Created_dttm
.
# data type
str(df_den_311)
The column Case_Created_dttm
records at what date and
time the “311 request” was received. Let’s convert this column to a
particular data type as ** date **. Read more
about the datetime format in R.
# convert to date
df_den_311$Case_Created_dttm # Fri, 29 Dec 2023 19:20:08 GMT
df_den_311$case_create_date <- as.Date(df_den_311$Case_Created_dttm,
format = "%a, %d %b %Y %H:%M:%S")
# extract week numbers
df_den_311$case_create_wk <- strftime(df_den_311$case_create_date,
format = "%V") %>% as.numeric()
[Q3] After transformation, What is the data type of
the column case_create_date
?
# Check data type again.
str(df_den_311)
The column Case_Summary
categorizes people’s 311
requests into different types. Let’s check what issues that people have
reported in 2023 and their frequency.
[Q4] What are the top 5 categories of 311 service requests?
# Case summary - contingency table - ordered by frequency
df_311_summary <- table(df_den_311$Case_Summary) %>%
as.data.frame() %>% arrange(-Freq)
# check
View(df_311_summary)
OK, there are over 1000 categories of service requests. Let’s only
focus on the top 5 frequently mentioned category and save it as a new
data df_den_311_top
.
# extract 311 requests in the most mentioned categories
df_den_311_top <- df_den_311 %>%
filter(Case_Summary %in% df_311_summary$Var1[1:5]) %>%
select(case_create_wk, Case_Summary) %>%
mutate(n=1)
Count the number of 311 service requests in the top 5 categories by week.
# (one-hot encoding)
df_den_311_top <- dcast(df_den_311_top,
case_create_wk ~ Case_Summary,
fill = 0, value.var = "n",
fun.aggregate = length)
View(df_den_311_top)
From here, we are going to change the structure of the data
df_den_311_top
and reshape (or tidy) it in a specific
way so that:
The process if tidying is illustrated in Figure 6.1.
[Q5] Please compare df_den_311_top
before and after, and try to explain how the “tidy” process changes the
data structure. If you need to look at the previous data, please go back
to Step
6.4.3 and rerun the script from there.
# Tidy data using the function pivot_longer()
df_den_311_top <- df_den_311_top %>%
pivot_longer(cols = 2:ncol(.), names_to = "category", values_to ="count")
Let’s make a basic plot exploring how the top 5 categories of 311 requests change over times (by week).
# viz: time series
p1_311 <- ggplot(df_den_311_top, aes(x=case_create_wk, y=count, fill=category)) +
geom_area(alpha=0.6 , linewidth=.5, colour="white")
# have a look
p1_311
Improve the plot by using the viridis color palette
from the viridis
library.
# viz: improve
p2_311 <- ggplot(df_den_311_top, aes(x=case_create_wk, y=count, fill=category)) +
geom_area(alpha=0.6 , linewidth=.5, colour="white") +
scale_fill_viridis(discrete = T) +
theme_bw() +
xlab("Weeks in 2023") +
ylab(" Count of 311 Service Requests")
# export
pdf("plot/plot2_311_service_cat_week.pdf", width = 10, height = 6)
print(p2_311)
dev.off()
Okay, while the plot looks better now, the week number is very confusing. Let’s add some labels by connecting week numbers to seasons as suggested by Figure 6.2.
[Q6] Paste the screenshot of “plot3_311_service_cat_week_reference.pdf” to your report. And briefly discuss your findings from this analysis (failed to provide discussion will result in 2 points deduction).
# create a df showing when the four seasons start
df_311_season_label <- data.frame(
x_season = c(5,18,31,44),
y_season = 2000,
label = c("Winter","Spring", "Summer", "Fall")
)
# viz: improve by adding annotation & reference lines
p3_311 <- ggplot(df_den_311_top, aes(x=case_create_wk, y=count, fill=category)) +
geom_area(alpha=0.6 , size=.5, colour="white") +
scale_fill_viridis(discrete = T) +
theme_bw() +
geom_vline(xintercept = c(10,23,36,49),colour="black", linetype = "dotted") +
geom_label(data=df_311_season_label, aes(x = x_season, y = y_season, label = label), fill = "white", size=4) +
xlab("Weeks in 2023") +
ylab(" Count of 311 Service Requests")
# export
pdf("plot/plot3_311_service_cat_week_reference.pdf", width = 10, height = 6)
print(p3_311)
dev.off()
This section moves from time-series to spatial visualization of the 311 data.
[Q7] Please open the data frame
df_den_311
and report which columns contain spatial
information of 311 requests and could be used to convert “df_den_311” to
spatial data?
# open
View(df_den_311)
# list all column names
names(df_den_311)
Create simple features of 311 service requests and save the data as
sf_den_311
.
# Convert to spatial points
sf_den_311 <- st_as_sf(df_den_311, coords = c("Longitude", "Latitude"), crs = 4326)
Let’s check the CRS (coordinate reference system) of all simple features we have and project them to the 2D CRS, NAD83 / Colorado North (ftUS), which has its centroid in Colorado.
The value 2231 is the unique index for the CRS NAD83 / Colorado North (ftUS). More CRS could be found here https://epsg.io/.
# four spatial data (simple features). Let's check their coordination systems
sf_den_311$geometry
sf_den_tree$geometry
sf_den_ngbr$geometry
# project the data to NAD83 / Colorado North (ftUS)
# https://epsg.io/2231 - NAD83 / Colorado North (ftUS)
sf_den_311 <- st_transform(sf_den_311, 2231)
sf_den_tree <- st_transform(sf_den_tree, 2231)
sf_den_ngbr <- st_transform(sf_den_ngbr, 2231)
We will agregate 311 calls to neighborhoods. Have a look at the
sf_den_ngbr
data and create a new column called
label
for showing their names on maps.
## create a column "label" for mapping
sf_den_ngbr$label <- sf_den_ngbr$neighbourhood
sf_den_ngbr$label <- gsub(" ", "\n", sf_den_ngbr$label)
Count the number of 311 service requests with respect to different
neighborhoods by using the st_intersects()
function.
# Count 311 (2023) in each neighborhood
sf_den_ngbr_311 <- sf_den_ngbr
sf_den_ngbr_311["n_311"] <- lengths(st_intersects(sf_den_ngbr_311, sf_den_311))
Quick plot of 311 calls by neighborhood.
[Q8] Have a look at the file “map1_ngbr_311_requests_in_2023.pdf” and report which neighborhood has received the highest number of complaints in 2023?
# set mode to static mapping
tmap_mode("plot")
map1_ngbr_311 <- tm_shape(sf_den_ngbr_311) +
tm_polygons(col = "n_311", title = "311 Calls in 2023", palette = "Blues") +
tm_text(text = "label", size=0.4) +
tm_compass(position = c("left", "top")) +
tm_scale_bar(width = 0.1, position = c("left", "top"))
# export
pdf("plot/map1_ngbr_311_requests_in_2023.pdf", width = 10, height = 8)
print(map1_ngbr_311)
dev.off()
This and the following sections use the same dataset
sf_den_ngbr_311
and variable n_311
to explore
additional mapping techniques, i.e., cartogram.
A cartogram is a thematic map of a set of features, in which their geographic size is altered to be directly proportional to a selected variable.
# cartogram 1
# Construct a continuous area cartogram by a rubber sheet distortion algorithm
sf_den_ngbr_311_carto <- cartogram_cont(sf_den_ngbr_311, "n_311", itermax = 10)
map2_ngbr_311_c1 <- tm_shape(sf_den_ngbr_311_carto) +
tm_polygons(col = "n_311", title = "311 Calls in 2023", palette = "Blues") +
tm_text(text = "label", size=0.4) +
tm_compass(position = c("left", "top")) +
tm_scale_bar(width = 0.1, position = c("left", "top"))
pdf("plot/map2_ngbr_311_requests_in_2023_carto1.pdf", width = 10, height = 8)
print(map2_ngbr_311_c1)
dev.off()
# cartogram 2
# Construct a cartogram which represents each geographic region as non-overlapping circles (Dorling 1996).
sf_den_ngbr_311_carto_dorling <- cartogram_dorling(sf_den_ngbr_311, weight = "n_311", itermax = 10)
map3_ngbr_311_c2 <- tm_shape(sf_den_ngbr_311_carto_dorling) +
tm_polygons(col = "n_311", title = "311 Calls in 2023", palette = "Blues") +
tm_text(text = "label", size="n_311") +
tm_compass(position = c("left", "top")) +
tm_scale_bar(width = 0.1, position = c("left", "top"))
pdf("plot/map3_ngbr_311_requests_in_2023_carto2_dorling.pdf", width = 10, height = 8)
print(map3_ngbr_311_c2)
dev.off()
[Q9] Please paste the screenshots of the three files map1_ngbr_311_requests_in_2023.pdf, map2_ngbr_311_requests_in_2023_carto1.pdf, and map3_ngbr_311_requests_in_2023_carto2_dorling.pdf. Briefly discuss which map do you like the most and why (failed to provide discussion will result in 2 points deduction)?
This section uses the Parks, Medians, and Parkway Trees in Denver to produce a bivariate choropleth map.
Again, have a look at the sf_den_tree
data and explore
what columns are in it.
[Q10] Which column in sf_den_tree
do
you like the most and why?
View(sf_den_tree)
Join the two datasets sf_den_tree
and
sf_den_ngbr
by using st_join()
.
## spatial join to get neighobrhood information
sf_den_tree_ngbr <- st_join(sf_den_tree, sf_den_ngbr["neighbourhood"],
left = T, join = st_intersects)
## have a look at the data after joining
View(sf_den_tree_ngbr)
The column SPECIES_COMMON
tells about the species of
trees. Let’s check what are & how many trees of different
species.
# count tree species
df_tree_spec <- table(sf_den_tree_ngbr$SPECIES_COMMON) %>%
as.data.frame() %>% arrange(-Freq)
# open data
View(df_tree_spec)
Count trees of different species in each neighborhood using One-hot encoding.
# neighborhood tree species count
df_ngbr_tree <- sf_den_tree_ngbr %>%
select(neighbourhood, SPECIES_COMMON) %>%
mutate(n=1)
# remove geometry
st_geometry(df_ngbr_tree) <- NULL
# one-hot encoding
df_ngbr_tree <- dcast(df_ngbr_tree, neighbourhood ~ SPECIES_COMMON,
fill = 0, value.var = "n", fun.aggregate = length)
# open the final data
View(df_ngbr_tree)
Join the data df_ngbr_tree
to the spatial boundaries of
neighborhood sf_den_ngbr
.
# left join to simple features
sf_den_ngbr_tree <- left_join(sf_den_ngbr, df_ngbr_tree, by = "neighbourhood")
Let’s say we are interested in the two species “Honey Locust” and “Maple, Silver”, and display the counts of two species on a single map.
# two tree species
lt_tree_specs_lbl <- c("Honey Locust", "Maple, Silver")
At the final map, for each species, we want to divide the values into three categories using the “quantile” classification methods. You could try other classification methods that we have seen in Chapter 5.6.6.
# number of bins
nbins <- 3
# categorize count of a particular tree species into 3 bins using "quantile
brkhl <- classIntervals(sf_den_ngbr_tree$`Honey Locust`, n=nbins, style='quantile')
brkms <- classIntervals(sf_den_ngbr_tree$`Maple, Silver`, n=nbins, style='quantile')
class_hl <- findCols(brkhl)
class_ms <- findCols(brkms)
# convert x,y classes into a joint class x+3(y-1)
sf_den_ngbr_tree$class_hl_ms <- class_hl + nbins*(class_ms-1)
# bivariate color palette
col_biv <- stevens.bluered()
Preparing for mapping and legend.
map4_ngbr_trees_biv <- tm_shape(sf_den_ngbr_tree) +
tm_polygons("class_hl_ms", palette =col_biv, n =9 ) +
tm_text(text = "label", size=0.5) +
tm_compass(position = c("left", "top")) +
tm_scale_bar(width = 0.1, position = c("left", "top")) +
tm_layout(legend.show = F)
# add the color legend
map_legend <- levelplot(matrix(1:(nbins*nbins), nrow=nbins),
axes=FALSE, col.regions=col_biv,
xlab=list("Honey Locust -->", cex= 0.7),
ylab=list("Maple, Silver -->", cex=0.7), cuts=8, colorkey=FALSE,
scales=list(draw=0))
vp <- viewport(x=.75, y=.25, width=.15, height=.15)
Export.
[Q11] Please paste the screenshot of the file “map4_neighborhood_bivariate_trees.pdf” to your report.
# export
pdf("plot/map4_neighborhood_bivariate_trees.pdf", width = 12, height = 12)
print(map4_ngbr_trees_biv)
pushViewport(vp)
print(map_legend, newpage=FALSE)
popViewport()
dev.off()
Congratulations!! You have completed the entire tutorial and learnt new spatial analysis and visualization techniques!! Excellent work.
Please go “File”–> “Close Project” – a pop window asking “Do you want to save these changes” –> “Yes”.
Don’t forget to submit the lab6 report and your script to Canvas.