R – How to Extract Non-Spatial Data from Imported KML File Using R

kmlrrgdalsp

I have a KML file which was created using Google's My Maps.

The original file can be downloaded here:
Google My Maps

Using R, I can import this using the "readOGR" function of the rgdal library
This brings the KML file in as a SpatialPointsDataFrame (SPDF) – which i am calling asf52

In this SPDF, the spatial data is contained under @coords and is readily extracted into a dataframe using code like

df  <- data.frame(asf52@coords[,1:2])

However, I am struggling to come up with a way to neatly extract the the non-spatial data – contained under @data$Description – and turn it into a dataframe with a column for each variable.

Best Answer

You don't need to call data.frame() around the extract - the @data slot already is a data.frame. Just do

  df <- asf52@data

to pull out a copy. That said, you may be better served by using the newer sf library for this task:

library(sf)
ob_kml <- file.path(getwd(), 'Outbreaks 56 (OIE).kml')

There is more than one layer in your KML - list them with e.g.

st_layers(ob_kml)

Use read_sf() with the layers argument to choose your point data specifically and read it in. read_sf() defaults to stringsAsFactors = FALSE which may be preferable.

asf_c <- read_sf(ob_kml, layer = 'ASF in China.xlsx')

To get a plain dataframe, just drop the geometry as follows:

asf_c_df <- st_set_geometry(asf_c, NULL)

EDIT: I see your secondary issue now; it looks like neither sf nor sp look at the <ExtendedData> tags that hold the attribute data you want (open the KML in Notepad++ if you want to see what I mean). QGIS does detect and import them as separate attribute columns, so @Jella's advice is sound. I'm not sure if the issue here lies with sf/sp or GDAL, but it may be worth raising an issue of the sf github page.

In the meantime, your instinct to go with tidyr functions is sound, its just a little tricky to get a clean separation. The following looks pretty good:

asf_c_df <- st_set_geometry(asf_c, NULL) %>%
  # remove duplicate <br> tags
  dplyr::mutate(Description = gsub('<br><br>', '<br>', Description)) %>%
  # split on <br>
  tidyr::separate(., col = Description, 
                  into = c('Date', 'Province', 'City', 'County', 'Location', 
                           'Total_herd_size', 'Affected_animals', 'Deaths',
                           'Culled', 'Latitude', 'Longitude', 'Source'), 
                  sep = '<br>') %>%
  # ditch the key: part of key: value
  dplyr::mutate_all(., funs(gsub('^.*: ', '', .))) %>%
  # data type fixes 
  dplyr::mutate_at(vars(7:10), as.integer) %>%
  dplyr::mutate_at(vars(11,12), as.numeric) %>%
  # bonus points: proper dates. First, fix September, then cast to Date datatype
  dplyr::mutate(Date = gsub('Sept', 'Sep', Date),
                Date = as.POSIXct(Date, format = '%b %d, %Y')) %>%
  # double bonus! proper NA for missing data
  dplyr::mutate_if(is.character, funs(ifelse(. == '', NA, .)))

Related Solutions

[GIS] Plotting spatial data when two spatial objects have different CRS using R

If you draw the axes (argument axes=TRUE in your plot statements), you can see the different coordinate systems:

library("rgdal")
boros <- readOGR(dsn=".", "nybb")
rats <- read.csv("nycrats_missing_latlong_removed_4.2.14.csv", header=TRUE)
coordinates(rats) <- ~longitude + latitude

op <- par(mfrow=c(1,2))
plot(rats, axes=TRUE)
plot(boros, axes=TRUE)
par(op)

plot with axes argument

The boros data set is using NAD 1983 State Plane New York Long Island coordinate reference system (according to @mkennedy's comment above). Using spTransform we obtain the following result:

crs <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
proj4string(rats) <- crs
proj4string(boros) <- "+proj=lcc +lat_1=40.66666666666666 +lat_2=41.03333333333333 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000 +y_0=0 +datum=NAD83 +units=us-ft +no_defs +ellps=GRS80 +towgs84=0,0,0"

plot(spTransform(boros, CRS(crs)), axes=TRUE)
plot(rats, add=TRUE, col="#FF000050", pch=19, cex=0.3)

plot

[GIS] Merge spatial and non-spatial data and create SpatialPolygonsDataFrame in r

I would recommend reading your shapefile in with rgdal::readOGR. If you run into performance issues you should look up how to read in spatial data and merge data using the sf library and the simple features workflow.

For this to work I like to have column names that are to be merged to be identical before performing my merge. You can also specify column names using the by.x and by.y arguments in the merge function.

library(rgdal)
mydf   <- read.csv("myCsv.csv")
myspdf <- readOGR("myShapefile.shp")

## then merge using sp's merge function
mynewspdf <- merge(myspdf, mydf)

You may get a "non-unique matches detected" error, in which case you can try..

mynewspdf <- merge(myspdf, mydf, duplicateGeoms = T)

See for more info -> https://www.rdocumentation.org/packages/sp/versions/1.2-5/topics/merge

Best Answer

Related Solutions

[GIS] Plotting spatial data when two spatial objects have different CRS using R

[GIS] Merge spatial and non-spatial data and create SpatialPolygonsDataFrame in r

Related Question