Chapter 7: Network Visualization (1) Flow & Mobility

7.1 Lab Goals

In this chapter, you will learn

clean & organize network data
create network based on nodes and edges
map network

7.2 Good Practice

7.2.1 Organizing Folders & Sub-folders

Under the course folder, please create a folder called “lab7”. Next, in the lab7 folder, please create two sub-folders that one is called “data” and another one is “plot”.

7.2.2 Data

This chapter explores the Longitudinal Employer-Household Dynamics data from US Census. Particularly, we will play with the Job-to-Job Flows (J2J).

Job-to-Job Flows (J2J) is a set of statistics on job mobility in the United States. J2J include statistics on:

the job-to-job transition rate,

hires and separations to and from employment,

earnings changes due to job change, and

characteristics of origin and destination jobs for job-to-job transitions.

These statistics are available at the national, state, and metropolitan area levels and by worker and firm characteristics.

Heads-Up!

You would need to use Job-to-Job Flows (J2J) for Assignment 3.

Please follow the steps below to download data, unzip it and move the data to the required folder.

Go to https://github.com/fuzhen-yin/uccs_geoviz/blob/main/data/lab7_data.zip
Download the file “lab7_data.zip”
Unzip folder “lab7_data.zip”
Move all files from the “lab7_data” folder to the “data” folder under “lab7” see Step 7.2.1

If there you have any questions about the above-mentioned steps, please refer to Chapter 3.2.3 for detailed instructions.

7.2.3 Launching R Studio

Again, we would like to start a new project from scratch with a clean R Script. Please do the following steps. If you have any questions about these steps, please refer to the relevant chapters for help.

Step 1: Make sure all existing R projects are properly closed.
- If not, please close it by going to File –> Close Project –> Save changes (see Chapter 2.5).
Step 2: Create a New Project using Existing Directory, navigate to lab7, click open, then Create Project. (see Chapter 1.3).
Step 3: Create a New Script by go to File –> New File –> R Script. Save the script by giving it a proper name.

7.2.4 Before Start

Heads-Up!

All scripts are non-copyable. Lab questions are at the end of this tutorial.

7.3 Library & Data

7.3.1 Load Libraries

# Library ----
library(dplyr)
library(data.table)
library(sf)
library(tidyverse)
library(forcats)    # Reorder factor levels
library(igraph)     # network object
library(randomcoloR) # random color generator

7.3.2 Data

# US Census Cartographic Boundaries: US States (simple features)
sf_us_state <- st_read("data/cb_2023_us_state/cb_2023_us_state_5m.shp")

## Reading layer `cb_2023_us_state_5m' from data source 
##   `D:\OneDrive\UCCS\Teaching\GES4070\Labs\01_git_wk\uccs_geoviz\data\cb_2023_us_state\cb_2023_us_state_5m.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 56 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -179.1473 ymin: -14.55255 xmax: 179.7785 ymax: 71.35256
## Geodetic CRS:  NAD83

# Job-to-job flow (j2j) ----
### metadata: request info
j2j_meta_info <- read.csv("data/job_to_job_FROM_Colorado/request_info.csv")
### metadata: state
j2j_meta_state <- read.csv("data/job_to_job_FROM_Colorado/labels_state.csv")
### metadata: North American Industry Classification System (NAICS) code 
j2j_meta_naics <- read.csv("data/job_to_job_FROM_Colorado/labels_naics2_orig.csv")

### flows: separations from COLORADO 
j2j_from_co <- read.csv("data/job_to_job_FROM_Colorado/table.csv")
### flows: hires to COLORADO
j2j_to_co <- read.csv("data/job_to_job_TO_Colorado/table.csv")

7.4 J2J Viz: J2J Flow by Industry

7.4.1 Metadata

### metadata of the request info. 
### scroll down. flags, Year/Quarter: 2023 Q1; 2023 Q2; 2023 Q3
View(j2j_meta_info)
### metadata of state
View(j2j_meta_state)
### metadata of NAICS Sectors 
View(j2j_meta_naics)

7.4.2 J2J Job Outflow

### j2j_from_co: What states do workers move to? search '.flag'
View(j2j_from_co)
### data frame shape: 20 obs. 101 variables
str(j2j_from_co)

7.4.2.1 Tidy Data

### tidy the data 
j2j_from_co_cln <- j2j_from_co %>% 
  pivot_longer(cols = 2:ncol(j2j_from_co), 
               names_to = "to_state", values_to = "avg_n_job")

#### remove the record with ".flag". "!";    %like% operator.  find names containing the pattern ".flag"
j2j_from_co_cln <- j2j_from_co_cln %>% 
  filter(!to_state %like% ".flag")

#### remove missing values (na - not available) 
j2j_from_co_cln <- j2j_from_co_cln %>% na.omit()

#### update first column name from "X" to "naics_sector"
setnames(j2j_from_co_cln, old="X", new="naics_sector")

#### add a new column indicating the origin state of job flow
j2j_from_co_cln$from_state <- "Colorado"

#### add a new column showing flow director 
j2j_from_co_cln$direction <- "outflow_from_co"

#### correct state names by replacing dot "." with space " "
j2j_from_co_cln$to_state <- gsub("\\.", " " ,j2j_from_co_cln$to_state )

#### re-order columns
j2j_from_co_cln <- j2j_from_co_cln[c("from_state", "to_state", 
                                     "direction", "naics_sector", "avg_n_job")]

7.4.2.2 Quick Plot

plot: job outflow from Colorado by industry

##### data preparation
data <- j2j_from_co_cln 

#### box plot, reorder by median
ggplot(data = data, aes(x = reorder(naics_sector, avg_n_job), 
                        y=avg_n_job, fill =naics_sector )) + 
  geom_boxplot() + 
  xlab("NAICS Sector") +
  ylab("Job Counts") + 
  ggtitle("2023 Q1-Q3 Quaterly Average Job Outflow from Colorado") +
  theme(legend.position = "none", 
        axis.text.x = element_text(angle = 30, vjust = 1, hjust=1))

7.4.3 J2J Job Inflow

7.4.3.1 Tidy & Cleaning

Use the pipe operator %>% to chain functions together.

### tidy data; remove records with ".flag" suffix, remove na
j2j_to_co_cln <-  j2j_to_co %>% 
  pivot_longer(cols = 2:ncol(.), 
               names_to = "naics_sector", values_to = "avg_n_job" ) %>% 
  filter(!naics_sector %like% ".flag") %>% 
  na.omit() %>% 
  setnames(old="X", new="from_state") %>% 
  mutate(to_state = "Colorado", direction="inflow_to_co") %>% 
  select(from_state, to_state, direction, naics_sector, avg_n_job)

Correct naics names. Hint: the code below is copyable.

j2j_to_co_cln$naics_sector <- j2j_to_co_cln$naics_sector %>% 
  gsub("\\.\\.", ", ", .) %>% 
  gsub("\\.", " ", .) %>% 
  gsub("Other Services, except Public Administration", 
       "Other Services (except Public Administration)", .) %>% 
  gsub("^\\s+|\\s+$", "", .)

7.4.3.2 Box Plot of Job Inflow by Industry

##### data preparation
data <- j2j_to_co_cln 

#### box plot, reorder by median
ggplot(data = data, 
       aes(x = reorder(naics_sector, avg_n_job), 
           y=avg_n_job, 
           fill =naics_sector )) + 
  geom_boxplot() + 
  xlab("NAICS Sector") +
  ylab("Job Counts") + 
  ggtitle("2023 Q1-Q3 Quaterly Average Job Inflow to Colorado") +
  theme(legend.position = "none", 
        axis.text.x = element_text(angle = 30, vjust = 1, hjust=1))

7.5 J2J Viz: Network

7.5.1 Data Preparation

### combine inflow & outflow dataframe together
j2j_co_bysector <- rbind(j2j_from_co_cln, j2j_to_co_cln)
### combine all sectors together 
j2j_co_aggsector <- aggregate(avg_n_job ~ from_state + to_state + direction, 
                        data = j2j_co_bysector %>% select(-naics_sector), 
                        FUN = sum)
j2j_co_aggsector$naics_sector <- "all_sectors"


### Final data
j2j_co <- rbind(j2j_co_aggsector, j2j_co_bysector)

### list of unique states in j2j_co
lt_state <- unique(c(j2j_co$from_state, j2j_co$to_state))
### list of unique naics sectors in j2j_co
lt_sector <- unique(j2j_co$naics_sector)

### visualize job inflow network
data <- j2j_co %>% filter(naics_sector == "all_sectors", 
                          direction == "inflow_to_co")

# Create a network object 
network <- graph_from_data_frame(d=data, directed=F) 

# Network summary 
summary(network)

## IGRAPH 76ffeb2 UN-- 47 46 -- 
## + attr: name (v/c), direction (e/c), avg_n_job (e/n), naics_sector
## | (e/c)

7.5.2 Network Visualization

Quick plot of network

plot(network)

Improve network viz.

plot(network, vertex.size = 12, 
     edge.arrow.size = 0.5,
     edge.width=E(network)$avg_n_job * 0.002, 
     edge.curved = TRUE, main = "Job inflow to Colorado")

7.6 Viz: Mapping J2J Network

7.6.1 Data Preparation

#### project to sf_us_state to CRS NAD83 / Colorado North (ftUS)
sf_us_state_j2j <- sf_us_state %>%
  filter(NAME %in% lt_state) %>% st_transform(., 2231)

#### get state centroid  
df_node <- sf_us_state_j2j %>% select(NAME) %>% st_centroid()
#### split point data into lat & long 
df_node <- df_node %>% mutate( long = unlist(map(df_node$geometry, 1)),
                               lat = unlist(map(df_node$geometry, 2)))
st_geometry(df_node) <- NULL

#### join coordinates to edge 
j2j_co_xy <- j2j_co %>% 
  left_join(., df_node, by=c("from_state"="NAME")) %>% 
  setnames(old=c("long","lat"), new=c("long_start","lat_start")) %>% 
  left_join(., df_node, by=c("to_state"="NAME")) %>% 
  setnames(old=c("long","lat"), new=c("long_end","lat_end"))

7.6.2 Mapping Job Inflow Network

#### filter data 
data <- j2j_co_xy %>% filter(direction == "inflow_to_co", 
                             naics_sector == "all_sectors")

#### plot 
ggplot(sf_us_state_j2j) + 
  geom_sf() + 
  geom_segment(data = data,
               aes(x = long_start, y = lat_start, 
                   xend = long_end, yend = lat_end, 
                   size = avg_n_job, alpha = 0.5)) + 
  scale_size_continuous(range = c(0.1, 3)) + 
  geom_point(data =df_node , aes(x = long, y = lat), 
             shape = 18, fill = "white",
             color = 'black', stroke = 0.5) + 
  geom_text(data =df_node, 
            aes(x = long, y = lat, label = NAME), 
            size=2) + 
  ggtitle("Job Inflow to Colorado") + 
    ylab("Latitude") + 
    xlab("Longitude")

7.6.2 Function: Job Outflow Network

### create a function to plot job outflow by sectors  
map_j2j <- function(idx, full_data, state_boundary, state_point, 
                    c_direction, c_sector, edge_color){
  data <- full_data %>% filter(direction == as.character(c_direction),
                               naics_sector == as.character(c_sector))
  title <- paste("Job-to-job", as.character(c_direction), 
                 as.character(c_sector), sep=" ")
  p <- ggplot(state_boundary) + 
    geom_sf() + 
    geom_segment(data = data,
                 aes(x = long_start, y = lat_start, 
                     xend = long_end, yend = lat_end, 
                     size = avg_n_job, alpha = 0.5), 
                 colour = edge_color) + 
    scale_size_continuous(range = c(0.01, 4)) + 
    geom_point(data =state_point , aes(x = long, y = lat), 
               shape = 18, fill = "white",
               color = 'black', stroke = 0.5) + 
    geom_text(data =state_point, 
              aes(x = long, y = lat, label = NAME), 
              size=2) + 
    ggtitle(title) + 
    theme_bw() + 
    ylab("Latitude") + 
    xlab("Longitude")
  
  pdf(sprintf("plot/p%s_%s.pdf", idx, title),  width = 10, height = 5)
  print(p)
  dev.off()
}

Mapping job outflow by industry

### random color set 
n <- length(lt_sector)
col_random <- randomColor(n)


### map 
for (i in 1:length(lt_sector)) {
  edge_color = col_random[[i]]
  sector_toviz <- lt_sector[[i]]
  
  map_j2j(i, j2j_co_xy, sf_us_state_j2j, df_node, "outflow_from_co", 
          sector_toviz, edge_color)
}

7.7 Lab Questions

[Q1] Please list 3-5 functions that you have learnt from this tutorial and explain when and how to use them. Hint: a function is an object followed by (). For example, left_join(), str() are functions.

[Q2] Please open the plot folder, you should be able to see 21 plots. Please pick 2 plots to discuss your findings. Hint: the data frame j2j_co contains the original data of the plots.

[Q3] Have a look at the map p1_Job-to-job outflow_from_co all_sectors.pdf, how would you like to improve it?

7.7 Close & Exit

Congratulations!! You have completed the entire tutorial and learnt the intro to network visualization!! Excellent work.

Please go “File”–> “Close Project” – a pop window asking “Do you want to save these changes” –> “Yes”.

Don’t forget to submit the lab7 report and your script to Canvas.