[GIS] Using PostGIS for a large trajectory dataset

postgispostgis-2.0spatio-temporal-datatrackingtrajectory

I have a spatial trajectory dataset which basically contains about 10E6 (lat,lon,time)-tuples, grouped together by an identifier so they form trajectories.

Currently, SQLite is used to store the raw data.

Lat/Lon are in WGS84 but span only small areas (i.e. European country size).

I would like to store it in a geodatabase for easy descriptive & exploratory analysis, eg. calculating the overall and average distance for trajectories, and basic data preprocessing, eg. dismissing all tuples that are outside a polygon, for example.

Thus, I thought PostGIS would be optimal since I have worked with it before.

Some questions arise though: Should I just store the tuples as POINTS with GEOMETRY data type, or should I use MULTIPOINTS for whole trajectories?

And what about performance? Performance is not critical since the data is used for a research project and not for a production application, but I don't want to wait 5 hours to calculate distances between a million point-pairs.

What about indexing? What type of indices should I use and on what type of data?

I know this question is kind of general but I would like to see if people have experience with storing trajectory data and maybe they can give me some advice.

Best Answer

About different geometry types: From your description it looks like you should absolutely store your trajectories as linestrings. If you store them as points or multipoints you will have to build linestrings in runtime if you don't only want to do the calculations on the points defining the trajectories but also what is between the points. an Example (in meter-based projection)

select ST_Distance('MULTIPOINT(10 10, 10 20)'::geometry,'POINT(0 15)'::geometry)

Will return: 11.1803398874989 which is the distance from the closest point in the multipoint to the point. while:

select ST_Distance('LINESTRING(10 10, 10 20)'::geometry,'POINT(0 15)'::geometry)

will return 10 which is the distance from the point to the closest point to the closest point on the line.

The same things about intersections test. If you only uses the points, a trajectory passing over a polygon without points in the polygon, will not be found if you only calculate on the vertex.points and don't use the points to define linestrings.

About projections you should absolutely transform (with ST_Transform) your data to a local projection if possible. It is faster and you will get a lot more functions to choose from.

About R you should be aware that postgreSQL have have a procedural language handling R directly in the database: PL/R

About indexing: The spatial index used in postgis is GIST which builds an index of the bounding boxes.

Related Question