R – Convert CSV File with Long and Lat Geometry Column to sf Object in R

csvgeometryr

I can't seem to find the answer to this anywhere. How can I read in and convert a CSV file with geometry column containing long/lat to sf object. Here is the dput for the file

structure(list(date = c("2017-08-04", "2017-08-04", "2017-08-04", "2017-08-04", "2017-08-04", "2017-08-04"),
               is_boarded = c("0", "0", "0", "0", "1", "0"), 
               fire = c("0", "0", "0", "0", "0", "0"), 
               homeless = c("0", "0", "1", "1", "0", "0"), 
               address = c("1231 N harding ave", "5942 S peoria st", "6440 S seeley ave", "6428 S paulina st", "9015 S houston ave", "10917 S buffalo ave"), 
               zip_code = c("60651", "60621", "60636", "60636", "60617", "60617"), 
               ward = c("37", "16", "16", "15", "10", "10"), 
               community_area = c("23", "68", "67", "67", "46", "52"), 
               geometry = c("c(-87.7251002085875, 41.903236038454)", "c(-87.6473828702868, 41.7862165473861)", "c(-87.6750561873273, 41.7767719172303)", "c(-87.6666233031588, 41.7770234233244)", "c(-87.5499059450373, 41.731640678147)", "c(-87.5437832254962, 41.6970145984798)"), 
               PRI_NEIGH = c("Humboldt Park", "Englewood", "Englewood", "Englewood", "South Chicago", "East Side")
               ),
          row.names = c(NA, 6L), 
          class = "data.frame"
         )

Best Answer

Here's one way to approach it:

library(sf)
library(dplyr)

data=structure(list(date = c("2017-08-04", "2017-08-04", "2017-08-04", "2017-08-04", "2017-08-04", "2017-08-04"),
               is_boarded = c("0", "0", "0", "0", "1", "0"), 
               fire = c("0", "0", "0", "0", "0", "0"), 
               homeless = c("0", "0", "1", "1", "0", "0"), 
               address = c("1231 N harding ave", "5942 S peoria st", "6440 S seeley ave", "6428 S paulina st", "9015 S houston ave", "10917 S buffalo ave"), 
               zip_code = c("60651", "60621", "60636", "60636", "60617", "60617"), 
               ward = c("37", "16", "16", "15", "10", "10"), 
               community_area = c("23", "68", "67", "67", "46", "52"), 
               geometry = c("c(-87.7251002085875, 41.903236038454)", "c(-87.6473828702868, 41.7862165473861)", "c(-87.6750561873273, 41.7767719172303)", "c(-87.6666233031588, 41.7770234233244)", "c(-87.5499059450373, 41.731640678147)", "c(-87.5437832254962, 41.6970145984798)"), 
               PRI_NEIGH = c("Humboldt Park", "Englewood", "Englewood", "Englewood", "South Chicago", "East Side")
),
row.names = c(NA, 6L), 
class = "data.frame"
)

data_sf = data %>%
  mutate(geom = gsub(geometry,pattern="(\\))|(\\()|c",replacement = ""))%>%
  tidyr::separate(geom,into=c("lat","lon"),sep=",")%>%
  st_as_sf(.,coords=c("lat","lon"),crs=4326)

The gsub is removing any matching parentheses. We use \\ to escape them in the regex code. The | indicates we want to match ( , ) and the c character. Then use separate to separate the remaining values in the geometry column by a comma into two new columns.

st_as_sf will take your lat and lon coordinates that we created in the previous step and convert to an sf object. I guessed on the crs.