R – Extracting Raster Cell Values from Gridded World Population Dataset

cell-statisticsextractrrastervector

I am using the Gridded World Population dataset which gives population density values for each cell in a raster. I want to find the population of shapefiles by extracting the population density values from this raster. Data loaded here:

library(sf)
library(dplyr)
library(raster)
library(geodata)
tdir=tempdir()
# Gather Lower 48 / DC USA sf shapefile
stateurl = "https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_state_500k.zip"
non_L48 = c("72","78","15","66","69","60","02")
if(file.exists(paste(tdir,"/cb_2018_us_state_500k.shp",sep=""))==F){
  download.file(stateurl, destfile = file.path(tdir, "States.zip"))
  unzip(file.path(tdir,"States.zip"),exdir=tdir)}
USA = read_sf(paste(tdir,"/cb_2018_us_state_500k.shp",sep="")) %>%
  filter(!c(STATEFP%in%non_L48))

# Convert sf back to sp:
USAsp = as(USA,Class="Spatial")
#Download GWP dataset
geodata::population(year=2015,res=.5,path=tdir)
#load via raster:
popdat = raster(x = paste(tdir,"/pop/gpw_v4_population_density_rev11_2015_30s.tif",sep=""))

#Crop to USA borders
USA_transform = st_transform(USA,4326)
cr_USA = crop(popdat,extent(USA_transform))
USA_popdat = mask(cr_USA,USA_transform)

# Rename:
names(USA_popdat)[1]="pop_dens"

I now have a raster dataset cropped to the Lower 48 US States and DC called USA_popdat. As mentioned, I would like to find the total population within each shapefile by extracting via raster::extract(). So far I have this:

extracted = raster::extract(USA_popdat,
                            USA_transform,
                            fun=sum,
                            na.rm=TRUE,
                            p = T)

However, this only returns the sum of population density values within each cell contained by each respective shapefile. The sum of population density using the above code is 490,010,296, which is much larger than the ~350,000,000 in the U.S. population.

Intuitively, to achieve a total population count rather than a sum of population density, we should sum the population density and divide it by the number of cells. I am unsure about how to enter this operation into the fun= argument within raster::extract()

Further, when dividing the sum of population density (490,010,296) by the total number of cells in the U.S. we achieve 23.67:

sum(extracted)/ncells(USA_popdat)

which leads me to believe I am doing something incorrectly with the gridded world population dataset: https://sedac.ciesin.columbia.edu/data/set/gpw-v4-population-density-rev11

Best Answer

A regular grid in lat-long has cells of varying area. So adding a population density while ignoring the cell area will lead to incorrect values of population.

Luckily there's a handy raster::area function that computes the area of each cell in a lat-long raster.

> USA_area = raster::area(USA_popdat)
> plot(USA_area)

enter image description here

This is varying from 0.75ish to 0.55ish (km^2), which is a substantial range and needs to be considered.

If we multiply the area in km^2 by the density in people/km^2 we get a map of population count per cell:

> USA_count = USA_popdat * USA_area
> plot(USA_count)

enter image description here

And if we sum that up we get:

> sum(USA_count[], na.rm=TRUE)
[1] 330292666

...a population of 330 million which seems believable.

You can then do your extraction (I recommend using the exactextractr package which is fast and considers where polygons clip edge cells) on the population count and if you want density then divide the count by the polygon areas.

exactextractr is really fast. I got fed up with waiting for raster::extract to do its stuff.

> extracts = exact_extract(USA_count, USA_transform, fun="sum")
  |======================================================================| 100%
> head(extracts)
[1]  3079368 10562649  3963717  8755188  1934866  4732980
> USA_transform$pop = extracts
> plot(USA_transform[,"pop"])

enter image description here