# Introduction to Spatial Data Programming with R

# Preface

*Last updated: 2020-10-26 15:55:30 *

## 0.1 What is R?

**R** is a programming language and environment, originally developed for statistical computing and graphics. As of October 2020, there are ~16,000 R **packages** in the official repository CRAN^{1}.

Notable advantages of R are that it is a full-featured programming language, yet customized for working with data, relatively simple and has a huge collection of over 100,000 functions from various areas of interest.

R’s popularity has been steadily increasing in recent years (Figures 0.1–0.3).

A brief overview of the capabilities and packages for several domains of R use, are available in the “CRAN Task Views” (Figure 0.4).

## 0.2 R and analysis of spatial data

### 0.2.1 Introduction

Over time, there was an increasing number of contributed packages for handling and analyzing spatial data in R. Today, spatial analysis is a major functionality in R. As of October 2020, there are at least 185 *packages*^{6} specifically addressing spatial analysis in R.

Some important events in the history of spatial analysis support in R are summarized in Table 0.1.

Year | Event |
---|---|

pre-2003 | Variable and incomplete approaches (`MASS` , `spatstat` , `maptools` , `geoR` , `splancs` , `gstat` , …) |

2003 | Consensus that a package defining standard data structures should be useful; `rgdal` released on CRAN |

2005 | `sp` released on CRAN; `sp` support in `rgdal` |

2008 | Applied Spatial Data Analysis with R, 1^{st} ed. |

2010 | `raster` released on CRAN |

2011 | `rgeos` released on CRAN |

2013 | Applied Spatial Data Analysis with R, 2^{nd} ed. |

2016 | `sf` released on CRAN |

2018 | `stars` released on CRAN |

2019 | Geocomputation with R (https://geocompr.robinlovelace.net/) |

2021(?) | Spatial Data Science (https://www.r-spatial.org/book/) |

The question that arises here is: can R be used as a Geographic Information System (GIS), or as a comprehensive toolbox for doing spatial analysis? The answer is definitely *yes*. Moreover, R has some important advantages over traditional approaches to GIS, i.e., software with graphical user interfaces such as ArcGIS or QGIS.

*General* advantages of Command Line Interface (CLI) software include:

**Automation**—Doing otherwise unfeasible repetitive tasks**Reproducibility**—Precise control of instructions to the computer

Moreover, *specific* strengths of R as a GIS are:

- R capabilities in data
**processing**and**visualization**, combined with dedicated**packages**for spatial data - A
**single environment**encompassing all analysis aspects—acquiring data, computation, statistics, visualization, Web, etc.

Nevertheless, there are situations when *other* tools are needed:

**Interactive**editing or georeferencing (but see`mapedit`

package)- Unique GIS
**algorithms**(3D analysis, label placement, splitting lines at intersections) - Data that cannot fit in
**RAM**(but R can connect to spatial databases^{7}and other softwere for working with big data)

The following sections (0.2.2–0.2.11) highlight some of the capabilities of spatial data analysis packages in R, through short examples. We are going to elaborate on most of these packages later on in the book, and many of those examples will become clear.

### 0.2.2 Input and output of spatial data

Reading spatial layers from a file into an R data structure, or writing the R data structure into a file, are handled by external libraries:

**GDAL/OGR**is used for reading/writing vector and raster files, with`sf`

and`stars`

**PROJ**is used for handling CRS, in both`sf`

and`stars`

- Working with specialized formats, e.g.,
**HDF**with`gdalUtils`

or**NetCDF**with`ncdf4`

Package `sf`

combined with `RPostgreSQL`

can be used to read from, and write to, a **PostGIS** spatial database:

```
library(sf)
library(RPostgreSQL)
con = dbConnect(
PostgreSQL(),
dbname = "gisdb",
host = "159.89.13.241",
port = 5432,
user = "geobgu",
password = "*******"
)
dat = st_read(con, query = "SELECT name_lat, geometry FROM plants LIMIT 5;")
```

```
dat
## Simple feature collection with 5 features and 1 field
## geometry type: POINT
## dimension: XY
## bbox: xmin: 35.1397 ymin: 31.44711 xmax: 35.67976 ymax: 32.77013
## geographic CRS: WGS 84
## name_lat geometry
## 1 Iris haynei POINT (35.67976 32.77013)
## 2 Iris haynei POINT (35.654 32.74137)
## 3 Iris atrofusca POINT (35.19337 31.44711)
## 4 Iris atrofusca POINT (35.18914 31.51475)
## 5 Iris vartanii POINT (35.1397 31.47415)
```

### 0.2.3 `sf`

: Processing Vector Layers

**GEOS** is used for geometric operations on **vector layers** with `sf`

:

**Numeric operators**—Area, Length, Distance…**Logical operators**—Contains, Within, Within distance, Crosses, Overlaps, Equals, Intersects, Disjoint, Touches…**Geometry generating operators**—Centroid, Buffer, Intersection, Union, Difference, Convex-Hull, Simplification…

### 0.2.4 `stars`

: Processing Rasters

Geometric operations on *rasters* can be done with package `stars`

:

**Accessing cell values**—As matrix / array, Extracting to points / lines / polygons**Raster algebra**—Arithmetic (`+`

,`-`

, …), Math (`sqrt`

,`log10`

, …), logical (`!`

,`==`

,`>`

, …), summary (`mean`

,`max`

, …), Masking**Changing resolution and extent**—Cropping, Mosaic, Resampling, Reprojection**Transformations**—Raster <-> Points / Contour lines / Polygons

### 0.2.5 `geosphere`

: Geometric calculations on longitude/latitude

Package `geosphere`

implements *spherical* geometry functions for distance- and direction-related calculations on geographic coordinates (lon-lat).

### 0.2.6 `gstat`

: Geostatistical Modelling

As mentioned above, R was initially developed for statistical computing (Section 0.1). Accordingly, there is an extensive set of R packages for *spatial statistics*. For example, package `gstat`

provides a comprehensive set of functions for univariate and multivariate geostatistics, mainly for the purpose of spatial *interpolation*:

- Variogram modelling
- Ordinary and universal point or block (co)kriging
- Cross-validation

We are going to learn about the `gstat`

package in Chapter 12. An introduction to the package can also be found in Chapter 8 of *Applied Spatial Data Analysis with R* (Bivand, Pebesma, and Gomez-Rubio 2013).

### 0.2.7 `spdep`

: Spatial dependence modelling

Modelling with spatial weights:

- Building neighbor lists and spatial weights
- Tests for spatial autocorrelation for areal data (e.g., Moran’s I)
- Spatial regression models (e.g., SAR, CAR)

The `spdep`

package is beyond the scope of this book. An introduction to the package can be found in Chapter 9 of *Applied Spatial Data Analysis with R* (Bivand, Pebesma, and Gomez-Rubio 2013).

### 0.2.8 `spatstat`

: Spatial point pattern analysis

Package `spatstat`

provides a comprehensive collection of techniques for statistical analysis of spatial point patterns, such as:

- Kernel density estimation
- Detection of clustering using Ripley’s K-function
- Spatial logistic regression

The book *Spatial point patterns: methodology and applications with R* (Baddeley, Rubak, and Turner 2015) provides a thorough introduction to the subject of point pattern analysis using the `spatstat`

package. A more brief introduction can also be found in Chapter 7 of *Applied Spatial Data Analysis with R* (Bivand, Pebesma, and Gomez-Rubio 2013).

### 0.2.9 `osmdata`

: Access to OpenStreetMap data

Package `osmdata`

gives access to **OpenStreetMap (OSM)** data—the most extensive open-source map database in the worls—using the **Overpass API**^{9}.

```
library(sf)
library(osmdata)
q = opq(bbox = "Beer-Sheva, Israel")
q = add_osm_feature(q, key = "highway")
dat = osmdata_sf(q)
lines = dat$osm_lines
pol = dat$osm_polygons
pol = st_cast(pol, "MULTILINESTRING")
pol = st_cast(pol, "LINESTRING")
lines = rbind(lines, pol)
lines = lines[, "highway"]
lines = st_transform(lines, 32636)
plot(lines, key.pos = 4, key.width = lcm(4), main = "")
```

### 0.2.10 `ggplot2`

: Visualization

The `ggplot2`

package is one of the most popular packages in R. It provides advanced visualization methods through a well-designed and consistent syntax. The package supports visualization of both vector layers^{10} and rasters^{11}.

The `ggplot2`

package is highly customizable and capable of producing publication-quality figures and maps as well as original and innovative designs (Figure 0.13). One of its strengths is in easy preparation of “small-multiple”—or facet, in the terminology of `ggplot2`

—figures (Figure 0.14).

The `ggplot2`

package is beyond the scope of this book. A good place to start is the book *ggplot2: Elegant Graphics for Data Analysis*, by package author (Wickham 2016). The book is available online^{13}.

### 0.2.11 `leaflet`

, `mapview`

: Web mapping

Packages `leaflet`

and `mapview`

provide methods to produce **interactive maps** using the Leaflet JavaScript library.

Package `leaflet`

gives more low-level control. Package `mapview`

is a wrapper around `leaflet`

, automating addition of useful features:

- Commonly used basemaps
- Color scales and legends
- Labels
- Popups

Function `mapview`

produces an interactive map given a spatial object. The `zcol`

parameter is used to specify the *attribute* used for symbology:

```
library(sf)
library(mapview)
states = st_read("USA_2_GADM_fips.shp")
mapview(states, zcol = "NAME_1")
```

## 0.3 Other materials

### 0.3.1 Books

*Model-based Geostatistics*(2007)*A Practical Guide to Geostatistical Mapping*(2009)*Spatial Data Analysis in Ecology and Agriculture using R*(2012)*Learning R for Geospatial Analysis*(2014)*Applied Spatial Data Analysis with R*(1^{st}ed. 2008, 2^{nd}ed. 2013)*Hierarchical Modeling and Analysis for Spatial Data*(1^{st}ed. 2003, 2^{nd}ed. 2014)*An Introduction to R for Spatial Analysis and Mapping*1^{st}ed. 2015, 2^{nd}ed. 2018)*Spatial Point Patterns: Methodology and Applications with R*(2015)*Displaying Time Series, Spatial, and Space-Time Data with R*(1^{st}ed. 2014, 2^{nd}ed. 2018)*Predictive Soil Mapping with R*(2019)*Geocomputation with R*(2019)*Spatial Data Science*(2021?)

### 0.3.2 Papers

### 0.3.3 Courses and tutorials

#### 0.3.3.1 Courses

- GEOG 4/595: Geographic Data Analysis
- CP6521 Advanced GIS
- ES214 Introduction to GIS and Spatial Analysis
- GEOG 4/590: R for Earth-System Science
- GEOG 4/595: Geographic Data Analysis
- Spatial Data Science with R (Robert J. Hijmans)
- Introduction to Spatial Data Programming with R (this course)
- GISC 422 Spatial Analysis and Modelling
- CASA0005 Geographic Information Systems and Science
- Another list here

#### 0.3.3.2 Tutorials

- Geospatial Data Science with R
- Data Carpentry Workshops
- GIS in R (Nick Eubank)
- NEON Data Tutorials
- Learn Spatial Analysis (University of Chicago)
- WUR Geoscripting
- Mapping in R
- Spatial Analysis notes
- Classifying Satellite Imagery in R
- Fundamentals of Spatial Analysis in R
- Handling and Analyzing Vector and Raster Data Cubes with R

#### 0.3.3.3 Presentations

#### 0.3.3.4 Official materials

### References

Baddeley, Adrian, Ege Rubak, and Rolf Turner. 2015. *Spatial Point Patterns: Methodology and Applications with R*. CRC press.

Bivand, Roger S., Edzer Pebesma, and Virgilio Gomez-Rubio. 2013. *Applied Spatial Data Analysis with R, Second Edition*. Springer, NY. https://asdar-book.org/.

Wickham, Hadley. 2016. *Ggplot2: Elegant Graphics for Data Analysis*. springer. https://ggplot2-book.org/.

Comprehensive R Archive Network↩

https://spectrum.ieee.org/computing/software/the-top-programming-languages-2019↩

https://www.nature.com/news/programming-tools-adventures-with-r-1.16609↩

https://cran.r-project.org/web/packages/sf/vignettes/sf2.html#reading_and_writing_directly_to_and_from_spatial_databases↩

http://paulbutler.org/archives/visualizing-facebook-friends/↩

https://cran.r-project.org/web/packages/sf/vignettes/sf5.html#ggplot2↩

https://cran.r-project.org/web/packages/stars/vignettes/stars3.html#geom_stars↩