R – Handling ‘sf::st_cast(“LINESTRING”) – Keeping First Linestring Only’ Warning


I have a dataset with geometry column in which there are linestrings and multilinestrings. While keeping the linestrings I want to convert the multilinestrings to linestrings (which should potentially increase the number of rows of the sf dataframe). Unfortunately when I use sf::st_cast("LINESTRING") there is a warning telling me that it is getting rid of all except the first linestring when transforming. Is there a way to keep all linestrings from the multilinestring when using sf::st_cast. Reproducible example with warning below:


# sample dataframe - creating linestrings
df1 <- data.frame(lon = 1:10, lat = 1:10, var = c(1,1,1,2,2,2,3,3,4,4)) %>%
  st_as_sf(coords = c("lon", "lat"), dim = "XY") %>% group_by(var) %>%
  summarise(geometry = st_union(geometry), do_union = F) %>% 

# creating a multilinestring
df2 <- df1[1:2,] %>% mutate(var = c(1,1)) %>% group_by(var) %>% 
  summarise(geometry = st_union(geometry), do_union = F) %>% 

# combining the two
df <- rbind(df1, df2)

# trying to convert only the multilinestring to two linestrings not changing 
the already existing linestrings
df <- df %>% st_cast("LINESTRING")

# Warning message:
# In st_cast.MULTILINESTRING(X[[i]], ...) : keeping first linestring only

I can do it manually first converting everything to multilinestring, and after everything to linestring like in the following:

df <- df %>% st_cast("MULTILINESTRING") %>% st_cast("LINESTRING")

but is there maybe a better way of doing this?

Best Answer

One alternative, to apply an st_cast to "LINESTRING" over each row:

> do.call(rbind,lapply(1:nrow(df),function(i){st_cast(df[i,],"LINESTRING")}))
Simple feature collection with 6 features and 1 field
geometry type:  LINESTRING
dimension:      XY
bbox:           xmin: 1 ymin: 1 xmax: 10 ymax: 10
epsg (SRID):    NA
proj4string:    NA
  var                   geometry
1   1 LINESTRING (1 1, 2 2, 3 3)
2   2 LINESTRING (4 4, 5 5, 6 6)
3   3      LINESTRING (7 7, 8 8)
4   4    LINESTRING (9 9, 10 10)
5   1 LINESTRING (1 1, 2 2, 3 3)
6   1 LINESTRING (4 4, 5 5, 6 6)

cant really be much better than:

> st_cast(st_cast(df, "MULTILINESTRING"),"LINESTRING")
Simple feature collection with 6 features and 1 field
geometry type:  LINESTRING
dimension:      XY
bbox:           xmin: 1 ymin: 1 xmax: 10 ymax: 10
epsg (SRID):    NA
proj4string:    NA
  var                   geometry
1   1 LINESTRING (1 1, 2 2, 3 3)
2   2 LINESTRING (4 4, 5 5, 6 6)
3   3      LINESTRING (7 7, 8 8)
4   4    LINESTRING (9 9, 10 10)
5   1 LINESTRING (1 1, 2 2, 3 3)
6   1 LINESTRING (4 4, 5 5, 6 6)

I assume that's what you mean in your last line, you don't give code. This is probably pretty close to optimal. library(microbenchmark) reckons the two-casts is about 10 times faster on your little example:

Unit: milliseconds
  expr      min       lq      mean    median       uq       max neval
 apply 9.087103 9.411445 10.056437 10.061594 10.50437 12.969576   100
 casts 1.737474 1.819215  2.000212  1.866471  1.92306  4.406047   100