This chapter continues to explore the techniques and analyses related to map-making. In Chapter 4, we have played with census data (e.g., tables and cartographic boundaries) to make choropleth maps. Today, we will use Airbnb data as an example to learn more advanced spatial analysis skills including:
Under the course folder, please create a folder called “lab5”. Next, in the lab5 folder, please create two sub-folders that one is called “data” and another one is “plot”.
In this lab, we will explore data sources other than US Census Bureau. Particularly, we will play with Airbnb data for the city of Denver (CO).
AirBnb has published datasets related to listings, reviews, and neighborhoods in major cities or tourist destinations all over the world. Here you can find the data dictionary of Airbnb data.
Please follow the steps below to download data, unzip it and move the data to the required folder.
If there you have any questions about the above-mentioned steps, please refer to Chapter 3.2.3 for detailed instructions.
Again, we would like to start a new project from scratch with a clean R Script. Please do the following steps. If you have any questions about these steps, please refer to the relevant chapters for help.
Heads-Up!
All scripts are non-copyable. Here are some coding standards and best practices to follow:
We start with mapping Airbnb listings. Unfortunately, there is no Airbnb data for Colorado Springs. But it has data for [Denver]{https://insideairbnb.com/denver/}.
Load libraries. We have seen most of these libraries except for
geojsonsf
which is often used to read spatial data in
.geojson format and convert it to simple features.
## install packages
install.packages("geojsonsf")
install.packages("reshape2")
install.packages("corrplot")
# library
library(sf) # simple features
library(tmap) # mapping
library(ggplot2) # plot
library(dplyr) # data manipulation
library(geojsonsf) # geojson
library(corrplot) # correlation matrix
library(reshape2) # one-hot encoding
Read data
# US Cartographic Boundaries: Cities & towns in Colorado
co_place <- st_read("data/cb_2023_08_CO_place_500k/cb_2023_08_place_500k.shp")
# Airbnb data
ab_listing <- read.csv("data/airbnb_denver/listings.csv")
ab_nghbr <- geojson_sf("data/airbnb_denver/neighborhood.geojson")
Examine data. Please have a look at these tables, and report:
[Q1]: How many records in “co_place”?
[Q2]: Which columns could be used to extract “Denver”
from “co_place”?
[Q3]: In “ab_nghbr”, how many
records (listings)?
[Q4]: In “ab_nghbr”, which two
columns contain spatial information and could be used for mapping?
View(co_place)
View(ab_nghbr)
View(ab_listing)
Check columns information in “ab_nghbr”
# column info
str(ab_listing)
## 'data.frame': 5016 obs. of 16 variables:
## $ id : num 360 364 590 592 1940 ...
## $ name : chr "Sit in the Peaceful Garden of the Chickadee Cottage in LoHi" "Lodo / RiNo LOFT via airport train" "Comfortable - and a great value!" "private" ...
## $ host_id : int 666 783 933 933 2150 990 135298 666 745200 923646 ...
## $ host_name : chr "Jennifer & Giovanni" "Jason" "Jill" "Jill" ...
## $ latitude : num 39.8 39.8 39.8 39.8 39.7 ...
## $ longitude : num -105 -105 -105 -105 -105 ...
## $ room_type : chr "Entire home/apt" "Entire home/apt" "Private room" "Private room" ...
## $ price : int 161 179 70 64 88 43 108 137 169 36 ...
## $ minimum_nights : int 30 185 1 15 1 30 4 3 29 29 ...
## $ number_of_reviews : int 181 87 726 175 183 236 169 699 11 133 ...
## $ last_review : chr "2024-03-02" "2016-04-26" "2024-05-22" "2024-06-02" ...
## $ reviews_per_month : num 2.53 0.47 3.91 0.94 2.02 1.36 1.01 4.13 0.07 0.85 ...
## $ calculated_host_listings_count: int 2 1 2 2 1 1 1 2 4 1 ...
## $ availability_365 : int 110 358 236 219 231 320 318 65 302 150 ...
## $ number_of_reviews_ltm : int 3 0 24 7 51 2 19 42 0 4 ...
## $ license : chr "2017-BFN-0002177" "" "2021-BFN-0000578" "2021-BFN-0000578" ...
For all listings in Denver, count different room types. The
function table()
builds a contingency table.
[Q5]: Report the number of listings of different
room types.
# count room types
table(ab_listing$room_type)
Create a barplot to show the number of different room types.
# quick plot of room types
table(ab_listing$room_type) %>% barplot()
Descriptive statistics listings’ prices.
# statistics
summary(ab_listing$price)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 17.0 95.0 138.0 382.7 219.0 99713.0 378
There are 378 NA (missing values) Let’s remove all rows with NA values
# remove NA values in the column "price"
ab_listing_cln <- ab_listing[!is.na(ab_listing$price),]
Extract two columns room_type, price and save them as a new dataframe called data_price
data_price <- ab_listing_cln[c('room_type', 'price')]
head(data_price)
## room_type price
## 1 Entire home/apt 161
## 2 Entire home/apt 179
## 3 Private room 70
## 4 Private room 64
## 5 Entire home/apt 88
## 6 Private room 43
Boxplot to explore the distribution of prices based on room type
ggplot(data_price, aes(x=as.factor(room_type), y=price)) +
stat_boxplot(fill="slateblue", alpha=0.2) +
xlab("price")
Zoom in to the price range [0, 1500].
[Q6]: You will see a Warning Message: Removed 25 rows containing non-finite outside the scale range. What does this mean?
ggplot(data_price, aes(x=as.factor(room_type), y=price)) +
stat_boxplot(fill="slateblue", alpha=0.2) +
xlab("listing room type") +
scale_y_continuous(limits = c(0, 1500))
## Warning: Removed 25 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
Correlations between numeric variables
[Q7]: What columns in ab_listing_cln
have numeric values (e.g., integers, floats)?
# check columns in "ab_listing_cln" and identify numeric columns
str(ab_listing_cln)
## 'data.frame': 4638 obs. of 16 variables:
## $ id : num 360 364 590 592 1940 ...
## $ name : chr "Sit in the Peaceful Garden of the Chickadee Cottage in LoHi" "Lodo / RiNo LOFT via airport train" "Comfortable - and a great value!" "private" ...
## $ host_id : int 666 783 933 933 2150 990 135298 666 745200 923646 ...
## $ host_name : chr "Jennifer & Giovanni" "Jason" "Jill" "Jill" ...
## $ latitude : num 39.8 39.8 39.8 39.8 39.7 ...
## $ longitude : num -105 -105 -105 -105 -105 ...
## $ room_type : chr "Entire home/apt" "Entire home/apt" "Private room" "Private room" ...
## $ price : int 161 179 70 64 88 43 108 137 169 36 ...
## $ minimum_nights : int 30 185 1 15 1 30 4 3 29 29 ...
## $ number_of_reviews : int 181 87 726 175 183 236 169 699 11 133 ...
## $ last_review : chr "2024-03-02" "2016-04-26" "2024-05-22" "2024-06-02" ...
## $ reviews_per_month : num 2.53 0.47 3.91 0.94 2.02 1.36 1.01 4.13 0.07 0.85 ...
## $ calculated_host_listings_count: int 2 1 2 2 1 1 1 2 4 1 ...
## $ availability_365 : int 110 358 236 219 231 320 318 65 302 150 ...
## $ number_of_reviews_ltm : int 3 0 24 7 51 2 19 42 0 4 ...
## $ license : chr "2017-BFN-0002177" "" "2021-BFN-0000578" "2021-BFN-0000578" ...
Extract only numeric columns and save as a new dataframe data_num
lt_num_col <- c('price', 'minimum_nights', 'number_of_reviews', 'reviews_per_month', 'availability_365')
data_num <- ab_listing_cln[lt_num_col]
# remove NA
data_num <- na.omit(data_num)
Calculate correlation between numeric columns in data_num
## calculate the correlation matrix
m <- cor(data_num)
print(m)
## price minimum_nights number_of_reviews
## price 1.000000000 -0.02371576 -0.007670987
## minimum_nights -0.023715762 1.00000000 -0.203996638
## number_of_reviews -0.007670987 -0.20399664 1.000000000
## reviews_per_month -0.004795330 -0.38285847 0.629794109
## availability_365 0.034870555 0.16650887 -0.082388843
## reviews_per_month availability_365
## price -0.00479533 0.03487055
## minimum_nights -0.38285847 0.16650887
## number_of_reviews 0.62979411 -0.08238884
## reviews_per_month 1.00000000 -0.13221878
## availability_365 -0.13221878 1.00000000
Correlation plot 1
corrplot(m, method = 'square', type = 'lower', diag = FALSE)
Correlation plot 2
corrplot(m, method = 'ellipse', type = 'upper')
Check object class of “ab_listing”
# class
class(ab_listing_cln)
“ab_listing” is a data frame. The two columns longitude
and latitude
contain spatial information of listings. Let’s
use these two columns to convert “ab_listing” to spatial data (simple
features) and save it as a new variable sf_ab_listing
.
[Q8]: Compared to the data frame “ab_listing”, which new column has been added to “sf_ab_listing”?
# table to simple features
sf_ab_listing <- st_as_sf(ab_listing_cln,
coords = c("longitude", "latitude"),
crs=4326)
# class of sf_ab_listing
class(sf_ab_listing)
## [1] "sf" "data.frame"
# open data
head(sf_ab_listing)
## Simple feature collection with 6 features and 14 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -105.0021 ymin: 39.72021 xmax: -104.9111 ymax: 39.76672
## Geodetic CRS: WGS 84
## id name host_id
## 1 360 Sit in the Peaceful Garden of the Chickadee Cottage in LoHi 666
## 2 364 Lodo / RiNo LOFT via airport train 783
## 3 590 Comfortable - and a great value! 933
## 4 592 private 933
## 5 1940 Baker Studio: Private-full kitchen-W/D-walkable 2150
## 6 21745 Alexandra's King Bed Room Long-term Only 990
## host_name room_type price minimum_nights number_of_reviews
## 1 Jennifer & Giovanni Entire home/apt 161 30 181
## 2 Jason Entire home/apt 179 185 87
## 3 Jill Private room 70 1 726
## 4 Jill Private room 64 15 175
## 5 Joanne Entire home/apt 88 1 183
## 6 Alexandra Private room 43 30 236
## last_review reviews_per_month calculated_host_listings_count availability_365
## 1 2024-03-02 2.53 2 110
## 2 2016-04-26 0.47 1 358
## 3 2024-05-22 3.91 2 236
## 4 2024-06-02 0.94 2 219
## 5 2024-06-27 2.02 1 231
## 6 2024-04-13 1.36 1 320
## number_of_reviews_ltm license geometry
## 1 3 2017-BFN-0002177 POINT (-105.0021 39.76641)
## 2 0 POINT (-104.9791 39.76672)
## 3 24 2021-BFN-0000578 POINT (-104.9111 39.75511)
## 4 7 2021-BFN-0000578 POINT (-104.9111 39.75481)
## 5 51 2018-BFN-0002596 POINT (-104.9975 39.72021)
## 6 2 POINT (-104.9771 39.7449)
Quick Map of three datasets:
plot(ab_nghbr$geometry)
plot(co_place$geometry)
plot(sf_ab_listing$geometry)
Now we have three simple feature objects including
co_place
collected from US Census, ab_nghbr
and sf_ab_listing
from Airbnb data. Let’s
check their Coordinate Reference System (CRS) and geometry types.
[Q9]: Please report the CRS and geometry types of:
co_place
ab_nghbr
sf_ab_listing
# Check CRS & geometry types
co_place$geometry
ab_nghbr$geometry
sf_ab_listing$geometry
The World Geodetic System 1984 (WGS 84) is a global reference system used for determining positions on the Earth’s surface. North American Datum of 1983 (NAD 83) is the horizontal and geometric control datum for the United States, Canada, Mexico, and Central America.
Let’s use NAD 83 as the main CRS for this lab and convert WGS 84 to NAD 83.
# convert "ab_nghbr" and "sf_ab_listing" from WGS84 to NAD83
ab_nghbr <- st_transform(ab_nghbr, st_crs(co_place))
sf_ab_listing <- st_transform(sf_ab_listing, st_crs(co_place))
# check their coordinate systems again, they all share the same CRS which is NAD83
ab_nghbr$geometry
sf_ab_listing$geometry
MAP1: Quick interactive mapping of Airbnb listings in Denver.
# set mode to "view" for interactive mapping
tmap_mode("view")
# MAP1: basic map of Airbnb listing in denver
tm_shape(sf_ab_listing) + tm_dots()
[Q10]: Run the chunk of code below and try to explain the purposes of the following commands:
tm_dots()
size = "price"
fill = "room_type"
fill_alpha=0.5
fill.scale = tm_scale_categorical(values = 'cols4all.friendly9')
# improve map1
tm_shape(sf_ab_listing) + tm_dots(size = 'price',
fill = "room_type", fill_alpha=0.5,
fill.scale = tm_scale_categorical(values = 'cols4all.friendly9'),
size.scale = tm_scale_continuous(trans='log1p', values.scale = 0.7))
We have loaded the spatial data about neighborhoods in Denver in Step 5.3.1. .
Have a look at the neighborhood data ab_nghbr
.
head(ab_nghbr)
## Simple feature collection with 6 features and 1 field
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -105.0253 ymin: 39.66038 xmax: -104.9594 ymax: 39.79832
## Geodetic CRS: NAD83
## neighbourhood geometry
## 1 Chaffee Park MULTIPOLYGON (((-104.9981 3...
## 2 Sunnyside MULTIPOLYGON (((-104.9982 3...
## 3 Highland MULTIPOLYGON (((-104.9982 3...
## 4 University MULTIPOLYGON (((-104.9594 3...
## 5 Globeville MULTIPOLYGON (((-104.9688 3...
## 6 Jefferson Park MULTIPOLYGON (((-105.0095 3...
Let’s do a quick map of it. Hint: the following command will produce an error message (no worries, errors are a natural and inevitable part of coding).
The error message “Error: Loop 0 is not valid: Edge 44 is degenerate (duplicate vertex)” suggests that one geometry in the ab_nghbr is invalid.
# quick map
tm_shape(ab_nghbr) + tm_polygons(fill_alpha = 0.5, lwd = 1) + tm_text(text = "neighbourhood", size = 0.3)
It seems that the data ab_nghbr
contains some invalid
polygons. Let’s make them valid using the function
st_make_valid()
.
# make valid
ab_nghbr <- st_make_valid(ab_nghbr)
MAP2: neighborhoods in Denver
# change mapping mode to 'plot' for static map
tmap_mode("plot")
## ℹ tmap modes "plot" - "view"
## ℹ toggle with `tmap::ttm()`
#map
map2_den_nghbr <- tm_shape(ab_nghbr) + tm_polygons(fill_alpha = 0.5, lwd = 1) +
tm_compass() + tm_scalebar(width = 10) +
tm_text(text = "neighbourhood", size = 0.3)
map2_den_nghbr
pdf("plot/map2_denver_neighborhood.pdf", height = 10, width = 10)
print(map2_den_nghbr)
dev.off()
MAP3: Cities and towns in Colorado
tmap_mode("view")
tm_shape(co_place) + tm_polygons(fill="#81D8D0", fill_alpha = 0.3, lwd = 1)
# Let's extract the row of Denver
den_city <- co_place %>% filter(NAME == "Denver")
tm_shape(den_city) + tm_polygons(fill="pink", fill_alpha = 0.3, lwd = 2)
MAP4: Overlap three maps together (listing, neighborhoods, Denver city boundary)
[Q11]: Provide a screenshot of “map3_denver_airbnb_listing_point.pdf”.
# plot
tmap_mode("plot")
map_den_listing <- tm_shape(den_city) + tm_polygons(fill_alpha = 0, lwd = 2) +
tm_shape(ab_nghbr) + tm_polygons(fill_alpha = 0, lwd = 1) +
tm_shape(sf_ab_listing) + tm_dots(size = 'price',
fill = "room_type",fill_alpha=0.3,
fill.scale = tm_scale_categorical(values = 'cols4all.friendly9'),
size.scale = tm_scale_continuous(trans='log1p', values.scale = 0.5)) +
tm_compass() + tm_scalebar(width = 15) +
tm_layout(legend.position = c("left", "top"), frame = F)
map_den_listing
pdf("plot/map3_denver_airbnb_listing_point.pdf", width = 12, height = 8)
print(map_den_listing)
dev.off()
While the data sf_ab_listing
tells us where the listings
are located, and the data ab_nghbr
shows the boundaries of
neighborhoods. But they are not enough. Let’s say we want to know more
about the spatial distribution of those listings in
terms of:
how many listings are located in each neighborhoods and also their mean prices?
This section performs analysis and produces visualizations of Airbnb listings at the neighborhood level.
Examine the sf_ab_listing
data.
# count columns in "sf_ab_listing"
ncol(sf_ab_listing)
## [1] 15
Spatial joining two data sf_ab_listing
and
ab_nghbr
using the function st_join()
as long
as the geometries intersect. Save the joined data as a new variable
sf_ab_listing_nghbr
.
# spatial join to the left of the object
sf_ab_listing_nghbr <- st_join(sf_ab_listing, ab_nghbr,
left = T, join = st_intersects)
After spatial joining, the new data sf_ab_listing_nghbr
has a new column compared to sf_ab_listing
. Please compare
those two data and answer [Q12]: What is the new column
has been added to sf_ab_listing_nghbr
?
# count columns in the new variable "sf_ab_listing_nghbr"
ncol(sf_ab_listing_nghbr)
## [1] 16
# open data
head(sf_ab_listing_nghbr)
## Simple feature collection with 6 features and 15 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -105.0021 ymin: 39.72021 xmax: -104.9111 ymax: 39.76672
## Geodetic CRS: NAD83
## id name host_id
## 1 360 Sit in the Peaceful Garden of the Chickadee Cottage in LoHi 666
## 2 364 Lodo / RiNo LOFT via airport train 783
## 3 590 Comfortable - and a great value! 933
## 4 592 private 933
## 5 1940 Baker Studio: Private-full kitchen-W/D-walkable 2150
## 6 21745 Alexandra's King Bed Room Long-term Only 990
## host_name room_type price minimum_nights number_of_reviews
## 1 Jennifer & Giovanni Entire home/apt 161 30 181
## 2 Jason Entire home/apt 179 185 87
## 3 Jill Private room 70 1 726
## 4 Jill Private room 64 15 175
## 5 Joanne Entire home/apt 88 1 183
## 6 Alexandra Private room 43 30 236
## last_review reviews_per_month calculated_host_listings_count availability_365
## 1 2024-03-02 2.53 2 110
## 2 2016-04-26 0.47 1 358
## 3 2024-05-22 3.91 2 236
## 4 2024-06-02 0.94 2 219
## 5 2024-06-27 2.02 1 231
## 6 2024-04-13 1.36 1 320
## number_of_reviews_ltm license neighbourhood
## 1 3 2017-BFN-0002177 Highland
## 2 0 Five Points
## 3 24 2021-BFN-0000578 North Park Hill
## 4 7 2021-BFN-0000578 North Park Hill
## 5 51 2018-BFN-0002596 Baker
## 6 2 North Capitol Hill
## geometry
## 1 POINT (-105.0021 39.76641)
## 2 POINT (-104.9791 39.76672)
## 3 POINT (-104.9111 39.75511)
## 4 POINT (-104.9111 39.75481)
## 5 POINT (-104.9975 39.72021)
## 6 POINT (-104.9771 39.7449)
Extract columns from sf_ab_listing_nghbr
and save it as
a new variable df_listing_nghbr
for analysis.
# columns for analysis
lt_airbnb_cols <- c("neighbourhood", "room_type", "price")
# extract and save as a new variable
df_listing_nghbr <- sf_ab_listing_nghbr[lt_airbnb_cols]
# remove geometry for more efficient calculation
st_geometry(df_listing_nghbr) <- NULL
# each row is ONE airbnb lising
df_listing_nghbr["n_listing"] <- 1
Count room types in each neighborhood using one-hot encoding.
One-hot encoding is a commonly used technique in machine learning to convert categorical variables into numerical values.
Let’s borrow this concept and apply it for data cleaning and processing.
df_nghbr <- reshape2::dcast(df_listing_nghbr,
neighbourhood ~ room_type,
fill = 0, value.var = "n_listing",
fun.aggregate = length)
[Q13]: Have a look at the table “df_nghbr”, please briefly explain what this table is about?
View(df_nghbr)
Calculate the statistics of listings’ prices at neighborhood level in terms of mean, median, max, and min prices.
# statistics of prices at neighborhood level
df_nghbr_2 <- df_listing_nghbr %>% group_by(neighbourhood) %>%
summarise(mean_price = mean(price),
median_price = median(price),
max_price = max(price),
min_price = min(price),
n_ttl = sum(n_listing))
Join the two data frames df_nghbr
and
df_nghbr_2
.
# join two neighborhood tables
df_nghbr_2 <- left_join(df_nghbr_2, df_nghbr, by=c("neighbourhood"="neighbourhood"))
# see the first 5 rows
head(df_nghbr_2, 5)
## # A tibble: 5 × 10
## neighbourhood mean_price median_price max_price min_price n_ttl
## <chr> <dbl> <dbl> <int> <int> <dbl>
## 1 Athmar Park 152. 128 688 28 53
## 2 Auraria 186. 175 293 99 4
## 3 Baker 944. 110 99713 40 123
## 4 Barnum 154. 116. 421 63 32
## 5 Barnum West 139. 115 277 45 23
## # ℹ 4 more variables: `Entire home/apt` <dbl>, `Hotel room` <dbl>,
## # `Private room` <dbl>, `Shared room` <dbl>
Calculate percentage of listings of different room types.
## calculate percentage for columns 7:10
df_nghbr_2[, 7:10]
## # A tibble: 77 × 4
## `Entire home/apt` `Hotel room` `Private room` `Shared room`
## <dbl> <dbl> <dbl> <dbl>
## 1 40 0 13 0
## 2 4 0 0 0
## 3 109 0 14 0
## 4 30 0 2 0
## 5 19 0 4 0
## 6 3 0 3 0
## 7 9 0 1 0
## 8 160 0 6 0
## 9 121 0 2 0
## 10 103 2 17 4
## # ℹ 67 more rows
df_nghbr_2[, 7:10] <- 100 * df_nghbr_2[, 7:10] / df_nghbr_2$n_ttl
# check results
print(df_nghbr_2[, 7:10])
## # A tibble: 77 × 4
## `Entire home/apt` `Hotel room` `Private room` `Shared room`
## <dbl> <dbl> <dbl> <dbl>
## 1 75.5 0 24.5 0
## 2 100 0 0 0
## 3 88.6 0 11.4 0
## 4 93.8 0 6.25 0
## 5 82.6 0 17.4 0
## 6 50 0 50 0
## 7 90 0 10 0
## 8 96.4 0 3.61 0
## 9 98.4 0 1.63 0
## 10 81.7 1.59 13.5 3.17
## # ℹ 67 more rows
Join the attribute table df_nghbr_2
to the spatial data
ab_nghbr
.
## join "df_nghbr_2" with the simple feature "ab_nghbr"
ab_nghbr_var <- left_join(ab_nghbr, df_nghbr_2, by="neighbourhood" )
## check the results
head(ab_nghbr_var)
## Simple feature collection with 6 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -105.0253 ymin: 39.66038 xmax: -104.9594 ymax: 39.79832
## Geodetic CRS: NAD83
## neighbourhood mean_price median_price max_price min_price n_ttl
## 1 Chaffee Park 209.6571 152 535 37 35
## 2 Sunnyside 879.7308 166 85428 55 130
## 3 Highland 619.3210 167 99570 40 271
## 4 University 182.0606 109 1126 45 33
## 5 Globeville 116.5000 96 299 59 38
## 6 Jefferson Park 305.4182 158 9999 41 110
## Entire home/apt Hotel room Private room Shared room
## 1 91.42857 0 8.571429 0
## 2 93.07692 0 6.923077 0
## 3 95.20295 0 4.797048 0
## 4 75.75758 0 24.242424 0
## 5 89.47368 0 10.526316 0
## 6 84.54545 0 15.454545 0
## geometry
## 1 MULTIPOLYGON (((-104.9981 3...
## 2 MULTIPOLYGON (((-104.9984 3...
## 3 MULTIPOLYGON (((-104.9984 3...
## 4 MULTIPOLYGON (((-104.96 39....
## 5 MULTIPOLYGON (((-104.9688 3...
## 6 MULTIPOLYGON (((-105.0101 3...
MAP4: Quick map for all variables in ab_nghbr_var
.
[Q14]: paste the screenshot of “map4_quick_plot_neighborhoods_airbnb.pdf”.
# quick map & export
plot(ab_nghbr_var)
In mapping, Data Classification refers to the
process of breaking data into different groups. tmap
provides different ways to classify data
such as:
pretty
(default) rounds breaks into whole
numbers where possible and spaces them
evenly.jenks
that minimize the within-group
variance and maximize the between-group
variance.equal
divides the range of values into n
equal-sized intervals.quantile
creates several classes with exactly the
same number of objects (e.g., spatial features), but
having intervals of various lengths.sd
calculates a standard deviation of
a given variable, and next use this value as the break width.Let’s explore different classification methods and their impacts on
mappings by visualizing two variables mean_price
and
n_ttl
(number of listings in each neighborhood).
Create a function for mapping (it’s okay that you don’t understand the code below).
# write a function
f_map_airbnb <- function(data, column, mapping_style, n_breaks, c_palette){
p <- tm_shape(data) +
tm_polygons(fill = as.character(column),
fill.scale = tm_scale_intervals(style = as.character(mapping_style),
n=n_breaks,
values = as.character(c_palette) )) +
tm_title(text = as.character(mapping_style))
return(p)
}
MAP5: Mean listing price in neighborhoods
# histogram - quick
hist(ab_nghbr_var$mean_price)
[Q15]: In the pidf “map5_denver_neighborhood_meanprice_classifications.pdf”, discuss which data classification method do you like the most (please pay attention both to the map and the legends).
# number of breaks & color palette
n_brk <- 7
color_pal<- "met.hokusai3"
# mapping using different styles
map5_pretty <- f_map_airbnb(ab_nghbr_var, "mean_price", "pretty", n_brk, color_pal)
map5_jenks <- f_map_airbnb(ab_nghbr_var, "mean_price", "jenks", n_brk, color_pal)
map5_equal <- f_map_airbnb(ab_nghbr_var, "mean_price", "equal", n_brk, color_pal)
map5_quantile <- f_map_airbnb(ab_nghbr_var, "mean_price", "quantile", n_brk, color_pal)
map5_sd <- f_map_airbnb(ab_nghbr_var, "mean_price", "sd", n_brk, color_pal)
## combine all maps together
map5_meanprice <- tmap_arrange(map5_pretty,map5_jenks,
map5_equal, map5_quantile,
map5_sd,
ncol = 2)
map5_meanprice
## export
pdf("plot/map5_denver_neighborhood_meanprice_classifications.pdf", width = 10, height = 10)
print(map5_meanprice)
dev.off()
MAP6: Total rooms listing price in neighborhoods.
# histogram
hist(ab_nghbr_var$n_ttl)
# number of breaks
n_brk <- 7
color_pal <- "carto.blu_yl"
map6_pretty <- f_map_airbnb(ab_nghbr_var, "n_ttl", "pretty", n_brk, color_pal)
map6_jenks <- f_map_airbnb(ab_nghbr_var, "n_ttl", "jenks", n_brk, color_pal)
map6_equal <- f_map_airbnb(ab_nghbr_var, "n_ttl", "equal", n_brk, color_pal)
map6_quantile <- f_map_airbnb(ab_nghbr_var, "n_ttl", "quantile", n_brk, color_pal)
map6_sd <- f_map_airbnb(ab_nghbr_var, "n_ttl", "sd", n_brk, color_pal)
## combine all maps together
map6_ttl <- tmap_arrange(map6_pretty, map6_jenks,
map6_equal, map6_quantile,
map6_sd,
ncol = 2)
map6_ttl
## export
pdf("plot/map6_denver_neighborhood_count_total_listings_classifications.pdf", width = 10, height = 10)
print(map6_ttl)
dev.off()
[Q16]: In map6_denver_neighborhood_count_total_listings_classifications.pdf, please discuss which data classification method do you like the most (please pay attention both to the map and the legends).
Congratulations!! You have completed the entire tutorial by typing the script by yourself!! Excellent work.
Please go “File”–> “Close Project” – a pop window asking “Do you want to save these changes” –> “Yes”.
Don’t forget to submit the lab5 report and your script to Canvas.