Spatial Data Analysis Training
National Data Management Center for Health at EPHI and AAU Department of Statistics
December 25-29, 2024
Overview of spatial and non-spatial data analysis framework
Spatial Exploratory Data Analysis
Investigating spatial dependence
Spatial modeling framework
Geostatistical Modeling
GIS (Geographical Information Systems):
Spatial data, also known as geospatial data, refers to information that identifies the geographic location and characteristics of natural or constructed features and boundaries on the Earth.
This data is often represented in terms of Cartesian coordinates (x,y) for two-dimensional maps, but may also include altitude (z) for a three-dimensional representation.
Spatial data is essential for analyzing relationships within a geographical context.
Spatial Data can be collected in either
Let us see a basic way to represent the spatial data.
But there is a variety of data formats to represent the data to suit different applications.
In most cases, spatial data formats are an extension of existing data formats.
Spatial Data Formats Categorized by Type
Type | Spatial Data Formats |
---|---|
Text Files |
|
Binary/Compressed |
|
Images |
|
Databases |
|
Lines connect multiple vertices using paths, forming linear features (e.g., Rivers, roads, and pipelines).
Polygons are formed by connecting vertices in a closed path (e.g., Building footprints, agricultural fields).
Vector data are high accuracy and aesthetically pleasing graphics.
But processing can be intensive due to complex topology rules.
Summary of Vector vs Raster datas
Type Vector Data |
Sub type
|
Raster Data |
|
Additional information (Attribute data): |
|
Commonly used Vector data Formats:
File Type | Description and Extension |
Shapefile |
|
GeoJSON |
|
GML(Geography Markup Language) |
|
Commonly used Raster data Formats:
File Type | Description |
GeoTIFF (.tif , .tiff ) |
|
JPEG2000 (.jp2 , .j2k )
|
|
Erdas Imagine (.img )
|
|
ECW (Enhanced Compressed Wavelet) .ecw
|
|
Geometry: Refers to the coordinates that define the shape of an object.
Topology: Defines spatial relationships like adjacency, connectivity, and containment.
Rules:
Attribute Data: provide detailed information about spatial features.
Geospatial data relies on CRS to define locations on the Earth’s surface.
A CRS contains both a datum and a projection.
Common CRS include WGS84 (GPS coordinates) and UTM (Universal Transverse Mercator).
Numerous formats are used to document a CRS
. Three common formats include:
proj.4
; EPSG
; Well-known Text (WKT
).sf
and stars
These packages provide robust frameworks for handling spatial data in R:
sf (simple features): Designed for handling vector data like points, lines, and polygons.
stars: Focused on handling raster data (e.g., satellite images, grids) and spatio-temporal arrays.
Both packages rely on external libraries for specific tasks:
.shp
), GeoTIFFs, and others.spdep
: Spatial dependence and autocorrelation analysis.
spatstat
: Spatial point pattern analysis.
gstat
: Geostatistical modeling and interpolation.
tmap
: Simplifies creating static and interactive thematic maps, works with sf
objects.leaflet
: Interactive maps for web.mapview
for interactive web mappingsf()
packagesf package (simple features = points, lines, polygons) is the new.
Therefore, we focus on the sf package for the following reasons:
%>%
operator and works well with the tidyverse collection of R packages.st_
)The sf package is a modern alternative to the traditional sp, rgeos, and rgdal packages.
How simple features in R are organized?
The three classes used to represent simple features are:
sf, the table (data.frame) with feature attributes and feature geometries, which contains
sfc, the list-column with the geometries for each feature (record), which is composed of
sfg, the feature geometry of an individual simple feature.
st_*()
functionsCommon functions to manipulate sf
objects include the following:
st_read()
reads a sf
object,st_write()
writes a sf
object,st_crs()
gets or sets a new coordinate reference system (CRS),st_transform()
transforms data to a new CRS,st_intersection()
intersects sf
objects,st_union()
combines several sf
objects into one,st_simplify()
simplifies a sf
object,st_coordinates()
retrieves coordinates of a sf
object,st_as_sf()
converts a foreign object to a sf
object.We can dounload some free online spatial data at
We can read a shapefile or a sf
object with the st_read()
function of sf.
For example, here we read the ET_Admin3_2023.shp
which contains the region, zone and districts of Ethiopia.
sf
dplyr
package.Simple feature collection with 3 features and 8 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 37.51639 ymin: 13.68366 xmax: 38.56573 ymax: 14.89357
Geodetic CRS: WGS 84
FNID Country Region Zone Woreda Shape_Leng
1 ET2023A3010101 Ethiopia Tigray Northwest Tigray Tahtay Adiyabo 3.926501
2 ET2023A3010103 Ethiopia Tigray Northwest Tigray Zana 1.109448
3 ET2023A3010104 Ethiopia Tigray Northwest Tigray Tahtay Koraro 2.275121
Shape_Area FNID_A1 geometry
1 0.37426931 ET2023A301 MULTIPOLYGON (((37.52639 14...
2 0.04774669 ET2023A301 MULTIPOLYGON (((38.497 13.9...
3 0.06962166 ET2023A301 MULTIPOLYGON (((38.43626 14...
sf
object ethR_shape
is a data.frame
containing a collection with
A sf
object contains the following objects of class sf
, sfc
and sfg
:
sf
(simple feature): each row of the data.frame
is a single simple feature consisting of attributes and geometry.sfc
(simple feature geometry list-column): the geometry
column of the data.frame
is a list-column of class sfc
with the geometry of each simple feature.sfg
(simple feature geometry): each of the rows of the sfc
list-column corresponds to the simple feature geometry (sfg
) of a single simple feature.The sf package stores geometric features in a data frame.
Geographic data is stored in the special geometry column.
[1] "FNID" "Country" "Region" "Zone" "Woreda"
[6] "Shape_Leng" "Shape_Area" "FNID_A1" "geometry"
Region
column in the ethR_shape.shp contains the names of each region. We will use that as our subsetting column. Let’s call it region_4.Reprojection is the transformation of geometry coordinates, from one CRS to another.
It is an important part of spatial analysis workflow, since we often need to:
A vector layer can be reprojected with st_transform
.
Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["latitude",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["longitude",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
Simple feature collection with 1141 features and 8 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -162480.1 ymin: 376090.6 xmax: 1494221 ymax: 1646839
Projected CRS: WGS 84 / UTM zone 37N
First 10 features:
FNID Country Region Zone Woreda
1 ET2023A3010101 Ethiopia Tigray Northwest Tigray Tahtay Adiyabo
2 ET2023A3010103 Ethiopia Tigray Northwest Tigray Zana
3 ET2023A3010104 Ethiopia Tigray Northwest Tigray Tahtay Koraro
4 ET2023A3010105 Ethiopia Tigray Northwest Tigray Asgede
5 ET2023A3010106 Ethiopia Tigray Northwest Tigray Tselemti
6 ET2023A3010107 Ethiopia Tigray Northwest Tigray Sheraro town
7 ET2023A3010108 Ethiopia Tigray Northwest Tigray Shire Endaslasie town
8 ET2023A3010109 Ethiopia Tigray Northwest Tigray Selekleka
9 ET2023A3010110 Ethiopia Tigray Northwest Tigray Seyemti Adyabo
10 ET2023A3010111 Ethiopia Tigray Northwest Tigray Adi Daero
Shape_Leng Shape_Area FNID_A1 geometry
1 3.9265010 0.374269311 ET2023A301 MULTIPOLYGON (((340975.3 15...
2 1.1094484 0.047746694 ET2023A301 MULTIPOLYGON (((445673.4 15...
3 2.2751214 0.069621658 ET2023A301 MULTIPOLYGON (((439235.7 15...
4 2.3274188 0.134341222 ET2023A301 MULTIPOLYGON (((388463.3 15...
5 2.1683607 0.153435050 ET2023A301 MULTIPOLYGON (((381572.5 15...
6 0.4018419 0.006060853 ET2023A301 MULTIPOLYGON (((372705.9 15...
7 0.1477838 0.001221733 ET2023A301 MULTIPOLYGON (((424871 1561...
8 1.3106139 0.033449913 ET2023A301 MULTIPOLYGON (((441545.7 15...
9 1.3794723 0.061695376 ET2023A301 MULTIPOLYGON (((432113.4 16...
10 1.5071146 0.056097753 ET2023A301 MULTIPOLYGON (((428420.1 15...
So what happened?
plot()
function, that particular shapefile will plot not only the shapefile geometry but also the geometry for every column in the dataset (except for the geometry column)
Above are 9 maps. How many columns were there? Check code below:
st_drop_geometry
which returns a data.frame
: FNID Country Region Zone Woreda Shape_Leng
1 ET2023A3010101 Ethiopia Tigray Northwest Tigray Tahtay Adiyabo 3.9265010
2 ET2023A3010103 Ethiopia Tigray Northwest Tigray Zana 1.1094484
3 ET2023A3010104 Ethiopia Tigray Northwest Tigray Tahtay Koraro 2.2751214
4 ET2023A3010105 Ethiopia Tigray Northwest Tigray Asgede 2.3274188
5 ET2023A3010106 Ethiopia Tigray Northwest Tigray Tselemti 2.1683607
6 ET2023A3010107 Ethiopia Tigray Northwest Tigray Sheraro town 0.4018419
Shape_Area FNID_A1
1 0.374269311 ET2023A301
2 0.047746694 ET2023A301
3 0.069621658 ET2023A301
4 0.134341222 ET2023A301
5 0.153435050 ET2023A301
6 0.006060853 ET2023A301
Or use ggplot2
map
.Geometric operations on vector layers can conceptually be divided into three groups according to their output:
A single layer—e.g., centroid, buffer
A pair of layers—e.g., intersection area
There are several functions to calculate numeric geometric properties of vector layers in package sf
:
st_length
—Lengthst_area
—Areast_distance
—Distancest_bbox
—Bounding boxst_dimension
—Dimensions (0 = points, 1 = lines, 2 = polygons)ethR_shape
layer using st_area
.The result is assigned to a new attribute named area
in ethR_shape
:
Simple feature collection with 3 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 339878 ymin: 1512792 xmax: 453079 ymax: 1646839
Projected CRS: WGS 84 / UTM zone 37N
FNID Country Region Zone Woreda Shape_Leng
1 ET2023A3010101 Ethiopia Tigray Northwest Tigray Tahtay Adiyabo 3.926501
2 ET2023A3010103 Ethiopia Tigray Northwest Tigray Zana 1.109448
3 ET2023A3010104 Ethiopia Tigray Northwest Tigray Tahtay Koraro 2.275121
Shape_Area FNID_A1 geometry area
1 0.37426931 ET2023A301 MULTIPOLYGON (((340975.3 15... 2152127259 [m^2]
2 0.04774669 ET2023A301 MULTIPOLYGON (((445673.4 15... 570617621 [m^2]
3 0.06962166 ET2023A301 MULTIPOLYGON (((439235.7 15... 830999755 [m^2]
sf
provides common geometry-generating functions applicable to individual geometries, such as:
st_centroid
—Centroidsst_buffer
—Bufferst_sample
—Random sample pointsst_convex_hull
—Convex hullst_voronoi
—Voronoi polygonsFor example, to calculate the centroids of the “wored” polygons we can use the st_centroid
function as follows:
They can be plotted as follows
stars
: Rasters
Geometric operations on rasters can be done with package stars
:
+
, -
, …), Math (sqrt
, log10
, …), logical (!
, ==
, >
, …), summary (mean
, max
, …), Masking