R – Create Regular Square Grid and Find Centroid by Factors

rrastersfspterra

I want to create a regular, square grid of .5x.5 degree around my sampling points and find latlong centroid by factors.

I have a data frame which have three columns: (1) the "scientificName" column containing some species names; and its respective (2) longitude and (3) latitude values.

> head(coords)
            scientificName         x          y
1  Aceratobasis_cornicauda -40.56560 -19.901400
2  Aceratobasis_macilenta  -49.00881 -25.516721
3  Aceratobasis_nathaliae  -53.99830 -26.505600
4  Amazoneura_ephippigera  -73.18583  -4.372778
5  Amazoneura_ephippigera  -64.68917  -3.512500
6  Amazoneura_juruaensis   -72.90000  -7.618056

I started by creating a SpatialPoints object, and transforming it using sf pck.

coordinates(coords) <- ~long + lat
prj<-'+proj=longlat +datum=WGS84'
coords <- SpatialPoints(coords, proj4string = CRS(prj))
data_sf <- st_as_sf(coords,
                    coords = c("long", "lat"),
                    crs = st_crs("+proj=longlat +datum=WGS84"))

Then I created my grid using sf::st_make_grid:

grid <- data_sf %>%
  st_bbox() %>%
  st_as_sfc() %>%
  st_make_grid(cellsize = c(0.5, 0.5), 
               crs = "+proj=longlat +datum=WGS84",
               square = T) |> st_as_sf()

Here I was expecting my grid to have the same length as my df. Though sf consists in a large sfc_POLYGON of 153.900 elements. To workaround it I subset my data:

grid_subset <- grid[st_intersects(data_sf, grid) |> unlist(), ]

As expected, now my subset have the same length as my df. But now I'm kinda stuck, and assuming my coordinates fell in the cells' centroid (I'm not exactly sure about that; @Spacedman shed some light on this issue, see comments).
Finally, to find the centroid of my coordinates by "scientificName" factors I tried:

centroids <- grid_subset %>%
  group_by(scientificName) %>%
  summarise(centroid = st_centroid(st_union(grid[grid])))

But no success. I keep getting the error:

Error in `st_as_sf()`:
! Must group by variables found in `.data`.
✖ Column `scientificName` is not found.

Converting grid_subset to df and adding scientificName to it also didn't work. I just get a new error:

Error in `summarise()`:
! Problem while computing `centroid = st_centroid(st_union(grid[grid]))`.
ℹ The error occurred in group 1: scientificName = "Aceratobasis_cornicauda".

In sum, (1) I want a grid around my sampling points and (2) find latlong centroid by factors (or (2) before (1)).

EDIT:

I tried a different approach. First by estimating my factors' centroid, then creating the grid:

data <- read.table("clipboard", header=T)
centroids <- data %>%
  group_by(scientificName) %>%
  summarize(centroid_x = mean(x),
            centroid_y = mean(y))
sp_centroids <- SpatialPoints(centroids[, c("centroid_x", "centroid_y")], 
                              proj4string = CRS("+proj=longlat +datum=WGS84"))

scientific_names <- centroids$scientificName
attr(sp_centroids, "scientificName") <- scientific_names

centroids_sf <- st_as_sf(centroids,
                    coords = c("centroid_x", "centroid_y"),
                    crs = "+proj=longlat +datum=WGS84")

grid_sf <- centroids_sf %>%
  st_bbox() %>%
  st_as_sfc() %>%
  st_make_grid(cellsize = c(0.5, 0.5), 
               crs = "+proj=longlat +datum=WGS84",
               square = T) 

centroids_sf <- st_transform(centroids_sf, st_crs(grid_sf))|> st_as_sf()
head(centroids_sf) 

centroids_sf consists in a list of the same length as my factors. However, I'm not being able to properly convert my centroids_sf to a spatial object that I can use in raster().

Here is part of my session info:

other attached packages: [1] raster_3.5-21 dplyr_1.0.10 sp_1.5-0
sf_1.0-9

Best Answer

Starting with your coords data frame, this seems to work:

aggregate(cbind(x,y) ~ scientificName, data=coords, FUN=mean)