6.1 Lab Goals

Chapter 4 and Chapter 5 have introduced the fundamentals of spatial data manipulation and map-making, particularly, producing single-variable thematic maps. In this chapter, we will learn to:

  • visualize time series data
  • create cartograms
  • make bivariate choropleth map

6.2 Good Practice

6.2.1 Organizing Folders & Sub-folders

Under the course folder, please create a folder called “lab6”. Next, in the lab6 folder, please create two sub-folders that one is called “data” and another one is “plot”.

6.2.2 Data

This chapter continues to explore spatial data sources other than US Census Bureau. We will use the 311 Service Requests in 2023 and Parks, Medians, and Parkway Trees.

311 data contains pertinent information related to inquiries and service requests for the City and County of Denver.

Please follow the steps below to download data, unzip it and move the data to the required folder.

If there you have any questions about the above-mentioned steps, please refer to Chapter 3.2.3 for detailed instructions.

6.2.3 Launching R Studio

Again, we would like to start a new project from scratch with a clean R Script. Please do the following steps. If you have any questions about these steps, please refer to the relevant chapters for help.

  • Step 1: Make sure all existing R projects are properly closed.
    • If not, please close it by going to File –> Close Project –> Save changes (see Chapter 2.5).
  • Step 2: Create a New Project using Existing Directory, navigate to lab6, click open, then Create Project. (see Chapter 1.3).
  • Step 3: Create a New Script by go to File –> New File –> R Script. Save the script by giving it a proper name.

6.2.4 Before Start

Heads-Up!

All scripts are non-copyable.


6.3 Libraries & Input Data

Load libraries.

### Library
library(dplyr)        # data manipulation
library(tidyverse)    # tidy data 
library(reshape2)     # one-hot encoding 

library(sf)           # simple features     
library(geojsonsf)    # read geojson

library(tmap)         # mapping 
library(ggplot2)      # plot 

library(viridis)      # viz - color

library(cartogram)    # map - cartogram
library(pals)         # bivariate color palette
require(classInt)     # classIntervals, findCols
require(lattice)      # levelplot
require(grid)         # viewport

Input data

# Denver neighborhood (geojson --> simple features)
sf_den_ngbr <- geojson_sf("data/Denver_neighborhood.geojson")

# Denver 311 service requests (csv -- > data frame)
df_den_311 <- read.csv("data/2023_denver_311_service_request_cleaned.csv")

# Denver Trees (around 1 minute to read the data)
sf_den_tree <- geojson_sf("data/denver_trees.geojson")

6.4 Viz: 311 Service Requests by Week

In the previous chapter, we have played with AirBnb data.

This week, we will play with 311 Service Requests in Denver. 311 is a special telephone number supported in many communities in the United States that provides access to non-emergency municipal services like reporting abandoned vehicles, noise complaints, and graffiti.

6.4.1 Exploring 311 data

Before any analysis and visualization tasks, the first step is to learn about the data and check what are the attributes (or columns) in the data.

[Q1] In your opinion, which column in df_den_311 is most interesting and briefly explain why.

# open data
View(df_den_311)

And the distributions of the variables in, e.g., the Agency column.

# contingency table of the attribute 'Agency'
tb_311_agency_freq <- table(df_den_311$Agency)
View(tb_311_agency_freq)

# plot: contingency table
barplot(tb_311_agency_freq, main="311 Requests - Agency", las=2, cex.names = 0.5)

6.4.2 Data Type

Check data type of each column. [Q2] What is the data type of the column Case_Created_dttm.

# data type
str(df_den_311)

The column Case_Created_dttm records at what date and time the “311 request” was received. Let’s convert this column to a particular data type as ** date **. Read more about the datetime format in R.

# convert to date 
df_den_311$Case_Created_dttm   # Fri, 29 Dec 2023 19:20:08 GMT
df_den_311$case_create_date <- as.Date(df_den_311$Case_Created_dttm, 
                                       format = "%a, %d %b %Y %H:%M:%S")

# extract week numbers 
df_den_311$case_create_wk <- strftime(df_den_311$case_create_date, 
                                      format = "%V") %>% as.numeric()

[Q3] After transformation, What is the data type of the column case_create_date?

# Check data type again. 
str(df_den_311)

6.4.3 Top 5 Service Reuqests Categories

The column Case_Summary categorizes people’s 311 requests into different types. Let’s check what issues that people have reported in 2023 and their frequency.

[Q4] What are the top 5 categories of 311 service requests?

# Case summary - contingency table - ordered by frequency 
df_311_summary <- table(df_den_311$Case_Summary) %>% 
  as.data.frame() %>% arrange(-Freq)

# check
View(df_311_summary)

OK, there are over 1000 categories of service requests. Let’s only focus on the top 5 frequently mentioned category and save it as a new data df_den_311_top.

# extract 311 requests in the most mentioned categories 
df_den_311_top <- df_den_311 %>% 
  filter(Case_Summary %in% df_311_summary$Var1[1:5]) %>% 
  select(case_create_wk, Case_Summary) %>% 
  mutate(n=1)

Count the number of 311 service requests in the top 5 categories by week.

#  (one-hot encoding)
df_den_311_top <- dcast(df_den_311_top, 
                        case_create_wk ~ Case_Summary, 
                        fill = 0, value.var = "n",
                        fun.aggregate = length)

View(df_den_311_top)

From here, we are going to change the structure of the data df_den_311_top and reshape (or tidy) it in a specific way so that:

  • Each variable must have its own column.
  • Each observation must have its own row.
  • Each value must have its own cell.

The process if tidying is illustrated in Figure 6.1.

[Q5] Please compare df_den_311_top before and after, and try to explain how the “tidy” process changes the data structure. If you need to look at the previous data, please go back to Step 6.4.3 and rerun the script from there.

Figure 6.1: Diagram of data tidying.

# Tidy data using the function pivot_longer()
df_den_311_top <- df_den_311_top %>% 
  pivot_longer(cols = 2:ncol(.), names_to = "category", values_to ="count")

6.4.4 Time Series Plot

Let’s make a basic plot exploring how the top 5 categories of 311 requests change over times (by week).

# viz: time series 
p1_311 <- ggplot(df_den_311_top, aes(x=case_create_wk, y=count, fill=category)) +
  geom_area(alpha=0.6 , linewidth=.5, colour="white")

# have a look
p1_311

Improve the plot by using the viridis color palette from the viridis library.

# viz: improve 
p2_311 <- ggplot(df_den_311_top, aes(x=case_create_wk, y=count, fill=category)) +
  geom_area(alpha=0.6 , linewidth=.5, colour="white") + 
  scale_fill_viridis(discrete = T) + 
  theme_bw() +
  xlab("Weeks in 2023") + 
  ylab(" Count of 311 Service Requests")
  
  
# export 
pdf("plot/plot2_311_service_cat_week.pdf", width = 10, height = 6)
print(p2_311)
dev.off()

Okay, while the plot looks better now, the week number is very confusing. Let’s add some labels by connecting week numbers to seasons as suggested by Figure 6.2.

[Q6] Paste the screenshot of “plot3_311_service_cat_week_reference.pdf” to your report. And briefly discuss your findings from this analysis (failed to provide discussion will result in 2 points deduction).

Figure 6.2: When four seasons start in 2024.
Figure 6.2: When four seasons start in 2024.


# create a df showing when the four seasons start 
df_311_season_label <- data.frame(
  x_season = c(5,18,31,44),
  y_season = 2000, 
  label = c("Winter","Spring", "Summer", "Fall")
)

# viz: improve by adding annotation & reference lines  
p3_311 <- ggplot(df_den_311_top, aes(x=case_create_wk, y=count, fill=category)) +
  geom_area(alpha=0.6 , size=.5, colour="white") + 
  scale_fill_viridis(discrete = T) + 
  theme_bw() +
  geom_vline(xintercept = c(10,23,36,49),colour="black", linetype = "dotted") + 
  geom_label(data=df_311_season_label, aes(x = x_season, y = y_season, label = label), fill = "white", size=4)  + 
  xlab("Weeks in 2023") + 
  ylab(" Count of 311 Service Requests") 


# export 
pdf("plot/plot3_311_service_cat_week_reference.pdf", width = 10, height = 6)
print(p3_311)
dev.off()

6.5 Viz: Mapping 311 Requests

This section moves from time-series to spatial visualization of the 311 data.

[Q7] Please open the data frame df_den_311 and report which columns contain spatial information of 311 requests and could be used to convert “df_den_311” to spatial data?

# open
View(df_den_311)

# list all column names 
names(df_den_311)

6.5.1 From Dataframe to Simple Features

Create simple features of 311 service requests and save the data as sf_den_311.

# Convert to spatial points 
sf_den_311 <- st_as_sf(df_den_311, coords = c("Longitude", "Latitude"), crs = 4326)

Let’s check the CRS (coordinate reference system) of all simple features we have and project them to the 2D CRS, NAD83 / Colorado North (ftUS), which has its centroid in Colorado.

The value 2231 is the unique index for the CRS NAD83 / Colorado North (ftUS). More CRS could be found here https://epsg.io/.

# four spatial data (simple features). Let's check their coordination systems 
sf_den_311$geometry
sf_den_tree$geometry
sf_den_ngbr$geometry

# project the data to NAD83 / Colorado North (ftUS)
# https://epsg.io/2231 - NAD83 / Colorado North (ftUS)
sf_den_311 <- st_transform(sf_den_311, 2231)
sf_den_tree <- st_transform(sf_den_tree, 2231)
sf_den_ngbr <- st_transform(sf_den_ngbr, 2231)

6.5.2 Mapping 311 Calls by Neighborhood

We will agregate 311 calls to neighborhoods. Have a look at the sf_den_ngbr data and create a new column called label for showing their names on maps.

## create a column "label" for mapping 
sf_den_ngbr$label <- sf_den_ngbr$neighbourhood
sf_den_ngbr$label <- gsub(" ", "\n", sf_den_ngbr$label)

Count the number of 311 service requests with respect to different neighborhoods by using the st_intersects() function.

# Count 311 (2023) in each neighborhood  
sf_den_ngbr_311 <- sf_den_ngbr
sf_den_ngbr_311["n_311"] <- lengths(st_intersects(sf_den_ngbr_311, sf_den_311))

Quick plot of 311 calls by neighborhood.

[Q8] Have a look at the file “map1_ngbr_311_requests_in_2023.pdf” and report which neighborhood has received the highest number of complaints in 2023?

# set mode to static mapping
tmap_mode("plot")

map1_ngbr_311 <- tm_shape(sf_den_ngbr_311) + 
  tm_polygons(col = "n_311", title = "311 Calls in 2023", palette = "Blues") + 
  tm_text(text = "label", size=0.4) +
  tm_compass(position = c("left", "top")) +
  tm_scale_bar(width = 0.1, position = c("left", "top"))

# export 
pdf("plot/map1_ngbr_311_requests_in_2023.pdf", width = 10, height = 8)
print(map1_ngbr_311)
dev.off()

6.5.3 Cartograms

This and the following sections use the same dataset sf_den_ngbr_311 and variable n_311 to explore additional mapping techniques, i.e., cartogram.

A cartogram is a thematic map of a set of features, in which their geographic size is altered to be directly proportional to a selected variable.

# cartogram 1
# Construct a continuous area cartogram by a rubber sheet distortion algorithm
sf_den_ngbr_311_carto <- cartogram_cont(sf_den_ngbr_311, "n_311", itermax = 10)

map2_ngbr_311_c1 <- tm_shape(sf_den_ngbr_311_carto) + 
  tm_polygons(col = "n_311", title = "311 Calls in 2023", palette = "Blues") + 
  tm_text(text = "label", size=0.4) +
  tm_compass(position = c("left", "top")) +
  tm_scale_bar(width = 0.1, position = c("left", "top"))


pdf("plot/map2_ngbr_311_requests_in_2023_carto1.pdf", width = 10, height = 8)
print(map2_ngbr_311_c1)
dev.off()

6.5.4 Cartogram - Dorling

# cartogram 2
# Construct a cartogram which represents each geographic region as non-overlapping circles (Dorling 1996).
sf_den_ngbr_311_carto_dorling <- cartogram_dorling(sf_den_ngbr_311, weight = "n_311", itermax = 10)

map3_ngbr_311_c2 <- tm_shape(sf_den_ngbr_311_carto_dorling) + 
  tm_polygons(col = "n_311",  title = "311 Calls in 2023", palette = "Blues") + 
  tm_text(text = "label", size="n_311") +
  tm_compass(position = c("left", "top")) +
  tm_scale_bar(width = 0.1, position = c("left", "top"))

pdf("plot/map3_ngbr_311_requests_in_2023_carto2_dorling.pdf", width = 10, height = 8)
print(map3_ngbr_311_c2)
dev.off()

[Q9] Please paste the screenshots of the three files map1_ngbr_311_requests_in_2023.pdf, map2_ngbr_311_requests_in_2023_carto1.pdf, and map3_ngbr_311_requests_in_2023_carto2_dorling.pdf. Briefly discuss which map do you like the most and why (failed to provide discussion will result in 2 points deduction)?

6.6 Viz: Tree Maps

This section uses the Parks, Medians, and Parkway Trees in Denver to produce a bivariate choropleth map.

Again, have a look at the sf_den_tree data and explore what columns are in it.

[Q10] Which column in sf_den_tree do you like the most and why?

View(sf_den_tree)

6.6.1 Spatial Data Manipulation

Join the two datasets sf_den_tree and sf_den_ngbr by using st_join() .

## spatial join to get neighobrhood information 
sf_den_tree_ngbr <- st_join(sf_den_tree, sf_den_ngbr["neighbourhood"], 
                            left = T, join = st_intersects)

## have a look at the data after joining
View(sf_den_tree_ngbr)

The column SPECIES_COMMON tells about the species of trees. Let’s check what are & how many trees of different species.

# count tree species 
df_tree_spec <- table(sf_den_tree_ngbr$SPECIES_COMMON) %>% 
  as.data.frame() %>% arrange(-Freq)

# open data
View(df_tree_spec)

Count trees of different species in each neighborhood using One-hot encoding.

# neighborhood tree species count 
df_ngbr_tree <- sf_den_tree_ngbr %>% 
  select(neighbourhood, SPECIES_COMMON) %>% 
  mutate(n=1)

# remove geometry 
st_geometry(df_ngbr_tree) <- NULL

# one-hot encoding 
df_ngbr_tree <- dcast(df_ngbr_tree, neighbourhood ~ SPECIES_COMMON, 
                      fill = 0, value.var = "n", fun.aggregate = length)

# open the final data 
View(df_ngbr_tree)

Join the data df_ngbr_tree to the spatial boundaries of neighborhood sf_den_ngbr.

# left join to simple features 
sf_den_ngbr_tree <- left_join(sf_den_ngbr, df_ngbr_tree, by = "neighbourhood")

6.6.2 Bivariate Mapping

Let’s say we are interested in the two species “Honey Locust” and “Maple, Silver”, and display the counts of two species on a single map.

# two tree species 
lt_tree_specs_lbl <- c("Honey Locust", "Maple, Silver")

At the final map, for each species, we want to divide the values into three categories using the “quantile” classification methods. You could try other classification methods that we have seen in Chapter 5.6.6.

# number of bins 
nbins <- 3

# categorize count of a particular tree species into 3 bins using "quantile
brkhl <- classIntervals(sf_den_ngbr_tree$`Honey Locust`, n=nbins, style='quantile')
brkms <- classIntervals(sf_den_ngbr_tree$`Maple, Silver`, n=nbins, style='quantile')

class_hl <- findCols(brkhl)
class_ms <- findCols(brkms)

# convert x,y classes into a joint class x+3(y-1)
sf_den_ngbr_tree$class_hl_ms <- class_hl + nbins*(class_ms-1)

# bivariate color palette 
col_biv <- stevens.bluered()

Preparing for mapping and legend.

map4_ngbr_trees_biv <- tm_shape(sf_den_ngbr_tree) + 
  tm_polygons("class_hl_ms", palette =col_biv, n =9 ) +
  tm_text(text = "label", size=0.5) +
  tm_compass(position = c("left", "top")) +
  tm_scale_bar(width = 0.1, position = c("left", "top")) +
  tm_layout(legend.show = F)

# add the color legend
map_legend <- levelplot(matrix(1:(nbins*nbins), nrow=nbins), 
                        axes=FALSE, col.regions=col_biv,
                        xlab=list("Honey Locust -->", cex= 0.7), 
                        ylab=list("Maple, Silver -->", cex=0.7), cuts=8, colorkey=FALSE,
                        scales=list(draw=0))

vp <- viewport(x=.75, y=.25, width=.15, height=.15)

Export.

[Q11] Please paste the screenshot of the file “map4_neighborhood_bivariate_trees.pdf” to your report.

# export 
pdf("plot/map4_neighborhood_bivariate_trees.pdf", width = 12, height = 12)

print(map4_ngbr_trees_biv)
pushViewport(vp)
print(map_legend, newpage=FALSE)
popViewport()

dev.off()

6.7 Close & Exit

Congratulations!! You have completed the entire tutorial and learnt new spatial analysis and visualization techniques!! Excellent work.

Please go “File”–> “Close Project” – a pop window asking “Do you want to save these changes” –> “Yes”.

Don’t forget to submit the lab6 report and your script to Canvas.