[GIS] left_join breaking sf

rsf

Trying to add a data column to a simplefeatures point object using dplyr::left_join, but it appears to drop the sf class:

class(sf)
[1] "sf"        "data.frame"
class(df)
[1] "tbl_df"    "tbl"    "data.frame"

eg <- left_join(sf, df, by = 'common_col')

class(eg)
[1] "data.frame"

same thing happens when df is just a plain data frame. I can fix it easily enough with

eg <- st_as_sf(eg)

but this feels like an unnecessary extra step. Am I making poor function choices for what I'm trying to do, or is this an actual bug?

Best Answer

Until this gets changed you might want to define a left_join method for sf classes that does the conversion for you:

left_join.sf =
function(x,y,by=NULL,copy=FALSE,suffix=c(".x",".y"),...){
ret = NextMethod("left_join")
st_as_sf(ret)
}

Example: before:

> nc = st_read(system.file("shape/nc.shp", package="sf"))
> newdata = data.frame(CRESS_ID=1:100,Z=runif(100))
> nc2 = left_join(nc, newdata)
Joining, by = "CRESS_ID"
> class(nc2)
[1] "data.frame"

Then with left_join.sf, just using left_join(...) does:

> nc2 = left_join(nc, newdata)
Joining, by = "CRESS_ID"
> class(nc2)
[1] "sf"         "data.frame"

If you look at the source for what left_join does on a plain R data frame, you'll see it follows the same pattern - the actual join is done on a tbl version of x, and then its converted back to a data frame:

> dplyr:::left_join.data.frame
function (x, y, by = NULL, copy = FALSE, ...) 
{
    as.data.frame(left_join(tbl_df(x), y, by = by, copy = copy, 
        ...))
}