[GIS] Removing Outliers from point layer

clusteringdistanceqgis

I am a not so newbie user of QGIS but have no knowledge of Python.

I have a large dataset of points. Each point has attributes, one of which is a group it belongs to. It is expected that the points in each group form a compact form. But from initial analysis I've found that some groups have points very far away from the main "point cloud".

I'm trying to find a way to flag these outlier points to later delete or disconsider them from my analysis. I do have a theoretical approach but don't know if there is some plugin or script that i could use.

My reasoning goes like this. If for each group I calculate the sum of distances of each point to every other point in the group the outliers will have a much greater sum than an average point in the "point cloud". I can then use some measure of standard deviation to set my threshold level to flag points as outliers.

Question is. How do I make this sum? Considering I have a dataset of millions of points and thousands of groups?

Best Answer

This is actually a data science question vs. one directly related to GIS. And it's a concept that is way more complicated than it initially seems. If you're interested in ways of categorizing/grouping that data, you'll want to take a look at data clustering (2) and machine learning. The stats section of Stack Exchange would be a good resource to start with.

Related Question