GeoServer Cluster – Shared Disk Considerations for GeoServer Clustering

clusteringdata storagegeoserver

I am setting up a geoserver cluster with 2+ virtual servers running multiple geoserver instances each.

All servers must have access to a shared disk containing raster and vector data (geotiffs, pyramids, shp files, etc.). The size of the data is around 1TB.

Our implementation until now was to configure OCFS2 file system on a shared disk and mount in on each server node. Since OCFS2 does not seem to be maintained and since we have had quite a few problems with it in the past, we are looking for alternatives. Storing data with cloud providers, such as AWS, is not an option as it needs to be in-house.

We are also considering clustering with docker swarm but the same issue applies – all nodes in a cluster must be able to access the data.

What are your best practices for sharing data across multiple geoserver instances?

Best Answer

Depending on the architectural needs, the amount of data and format there are several that will work.

Probably the easiest would be if you have NAS storage devices in your datacenter that can present NFS or CIFS shares to the servers to use directly.

If that's not an option in your datacenter using a cluster file system like GlusterFS or Lustre. In my experience GlusterFS is generally easier to set up and run, but Luster has better performance, but both should work for GeoServer just fine.

If the data is (or can be transformed) into a format that works well with an object store (like Cloud Optimized GeoTiff), you could look into using a self hosted capability that supports the S3 API (this is also becoming more common in datacenter storage solutions). I've used MinIO for local testing & demos with GeoServer. I've used Ceph, but haven't tried it with GeoServer, but it would likely work (Ceph also provides a filesystem interface, but I've never tried that either).

Related Question