Detecting and Fixing Outliers in GPS Trajectory Data

gpslatitude longitudenoiseoutliertrajectory

I need to find an algorithm or method that can detect outlier latitude longitude points in a trajectory during post-processing, which can then be fixed (brought back into the trajectory's path based on its neighbours).

As an example of the kind of outlier points I would like to detect and fix, I've attached an image demonstrating:

Raw data in blue.

I have tried using an unscented Kalman filter to smooth out the data as best as possible, but this does not seem to work effectively enough for more extreme outliers (raw data in blue, smoothed data in red):

Raw data in blue, UKF smoothed data in red.

My UKF may not be calibrated properly (but I'm fairly certain that it is).

The trajectories are those of walkers, runners, cyclists – human-powered movement that can start and stop, but not drastically change in speed or position that quickly or suddenly.

A solution that does not rely on timing data (and only on position data) would be extremely useful (as the data being processed may not always contain timing data). However, I'm aware of how unlikely this kind of solution is to exist, so I'm equally as happy to have any solution!

Ideally, the solution would detect the outlier so that it could be fixed, resulting in a corrected trajectory:

Corrected raw data in green.


Resources I've sifted through:

Best Answer

Algorithm I use.

  1. Calculate Euclidean minimum spanning tree of points:

enter image description here

  1. Find 2 points most far apart from each other on this network

enter image description here

  1. Find shortest route between them:

enter image description here

As one can see it might cut corner on a sharp turn.

I have ArcGIS python implementation of above algorithm, it uses networkx module. Let me know if this is of interest and I'll update my answer with the script

UPDATE:

# Connects points to make polyline. Makes 1 line at a time
# Tool assumes that 1st layer in Table of Conternt is TARGET polyline feature class,
# second layer in TOC is SOURCE point fc.
# If no selection found in SOURCE layer, works on entire dataset

import arcpy, traceback, os, sys
import itertools as itt
from math import sqrt
sys.path.append(r'C:\Users\felix_pertziger\AppData\Roaming\Python\Python27\site-packages')
import networkx as nx
from networkx import dijkstra_path_length

try:
    def showPyMessage():
        arcpy.AddMessage(str(time.ctime()) + " - " + message)
    def CheckLayerLine(infc):
        d=arcpy.Describe(infc)
        theType=d.shapeType
        if theType!="Polyline":
            arcpy.AddWarning("\nTool designed to work with polylines as TARGET!")
            raise NameError, "Wrong input\n"
        return d
    def CheckLayerPoint(infc):
        d=arcpy.Describe(infc)
        theType=d.shapeType
        if theType!="Point":
            arcpy.AddWarning("\nTool designed to work with points as SOURCE!")
            raise NameError, "Wrong input\n"
        return d
    mxd = arcpy.mapping.MapDocument("CURRENT")
    layers = arcpy.mapping.ListLayers(mxd)
    if len(layers)<=1:
        arcpy.AddWarning("\nNot enough layers in the view!")
        raise NameError, "Wrong input\n"
    destLR, sourceLR=layers[0],layers[1]
    a = CheckLayerPoint(sourceLR);d = CheckLayerLine(destLR)

#  copy all points to manageable list
    g=arcpy.Geometry()
    geometryList=arcpy.CopyFeatures_management(sourceLR,g)
    nPoints=len(geometryList)
    arcpy.AddMessage('Computing minimum spanning tree')
    list2connect=[p.firstPoint for p in geometryList]
#  create network    
    p=list(itt.combinations(range(nPoints), 2))
    arcpy.SetProgressor("step", "", 0, len(p),1)
    G=nx.Graph()
    for f,t in p:
        p1=list2connect[f]
        p2=list2connect[t]
        dX=p2.X-p1.X;dY=p2.Y-p1.Y
        lenV=sqrt(dX*dX+dY*dY)
        G.add_edge(f,t,weight=lenV)
        arcpy.SetProgressorPosition()
    arcpy.AddMessage(len(G.edges()))
    mst=nx.minimum_spanning_tree(G)
    del G

#  find remotest pair
    arcpy.AddMessage(len(mst.edges()))
    length0=nx.all_pairs_dijkstra_path_length(mst)
    lMax=0
    for f,t in p:
        lCur=length0[f][t]
        if lCur>lMax:
            lMax=lCur
            best=(f,t)
    gL=nx.dijkstra_path(mst,best[0],best[1])
    del mst
    nPoints=len(gL)
    ordArray=arcpy.Array()
    for i in gL: ordArray.add(list2connect[i])

#  append line to TARGET
    curT = arcpy.da.InsertCursor(destLR,"SHAPE@")
    curT.insertRow((arcpy.Polyline(ordArray),))
    arcpy.RefreshActiveView()
    del curT

except:
    message = "\n*** PYTHON ERRORS *** "; showPyMessage()
    message = "Python Traceback Info: " + traceback.format_tb(sys.exc_info()[2])[0]; showPyMessage()
    message = "Python Error Info: " +  str(sys.exc_type)+ ": " + str(sys.exc_value) + "\n"; showPyMessage()