[GIS] LineString VS Multi-LineString Performance PostGIS

linestringperformancepostgis

Our road dataset (14mil segments, 300,000+ km length) is in Multi-LineString, I am thinking about shifting it all to LineString since all the roads in our data are single geometries and multi part geometries are errors. LineString also provides several different functions that I am really fond of using like ST_StartPoint and ST_EndPoint.

Is there any difference in the performance of LineString and Multi-LineString? and what is the recommended format for storing a large road dataset in PostGIS?

In response to the Comments:
It's not a single Multi-LineString, each one is a different feature. Number of geometries in each feature is 1.
I am not looking for the performance of a specific query.

What I want to know is the difference in speed of full table scans for LineString, and for the same geometry, stored as MultiLineString. Does the geometry type have any impact? The results of explain and the queries I tried, kept on fluctuating and there wasn't any noticeable difference

Best Answer

I wouldn't expect any significant difference in performance unless you have very unusual data. There are a few places in the code where optimizations are put in for edge cases like two-point LineStrings, and you will miss out on some of these if you're storing your LineStrings as MultiLineStrings. There may be some special code paths in GEOS that you miss out on with MultiLineStrings as well.

There's a storage overhead to Multi* geometries, which will be small as long as the number of points in your geometry is small (really, it's only significant for single-point MultiPoints).

That said, if your geometries all have only a single component, there is no advantage to storing them as MultiLineStrings, so you may as well convert to LineString. If nothing else, this will make your data model more self-documenting.

Related Question