[GIS] Nearest Neighbor Analysis Results

nearest neighborqgisqgis-3

I'm fairly new and entirely self-taught in using QGIS 3.4. I'm looking at turtle nest sites and trying to run a nearest neighbor analysis to analyze their distribution and establish whether or not they are clumped, dispersed or random. I'm doing this for a number of different years to determine whether nest sites are becoming more clumped as a result of environmental changes.

However, i'm not sure i understand the results that are being output. the z scores are just astronomical and the expected values are effectively 0s. How is QGIS establishing expected distances?

Maybe i'm just doing something fundamentally wrong, but i'm selecting the vector analysis -> Nearest Neighbor Analysis -> and selecting my vector layer containing the nest site points. These are some of the outputs I'm getting for different years:

Observed mean distance: 5.451281160546463
Expected mean distance: 0.00026372940677887315
Nearest neighbour index: 20669.97847197658
Number of points: 125
Z-Score: 442084.1070632401

Observed mean distance: 5.40372033661974
Expected mean distance: 0.00019992349282925835
Nearest neighbour index: 27028.941222203965
Number of points: 134
Z-Score: 598544.2290450602

Maybe i'm just doing something fundamentally wrong? Or maybe theres just a better way of achieving what i'm trying to do. I'm not sure.

Best Answer

I think that QGIS uses an algorithm that doesn't get the right answer. Personally I was not able to find out their process. I performed Nearest Neighbor Analysis on a data set using four different methods: pure Python, ArcMap Pro with ArcPy, ArcMap, and QGIS. The results of the first three methods were very, very close, while the QGIS method results were way off.

Here are a few points:

  1. The data has to be projected (pay attention to the distance units)
  2. The area is only used to calculate the expected average nearest neighbor distance. The idea is to use the same number of points, the same area and a coefficient, in order to derive an "expected" average nearest neighbor distance if the points were hypothetically randomly located. Then the ratio between the observed (actual) and the expected average will show you if the points tend to being clustered (ratio <1) or dispersed (ratio >1). Then the z-score is calculated to see the level of confidence (the higher the absolute value, the more significant)
  3. Christophe above has a few good points. However, for your study you should consider this (quote from ArcMap: The equations used to calculate the average nearest neighbor distance index and z-score are based on the assumption that the points being measured are free to locate anywhere within the study area (for example, there are no barriers, and all cases or features are located independently of one another). So you have to consider bodies of water, rocky areas, etc... when defining the study area.
  4. If QGIS doesn't let you define the area it's not going to work for you, because you don't know how it actually calculates the area, bounding box, minimum enclosing rectangle, minimum enclosing polygon, etc..., therefore the expected average distance will be off. Looking at your numbers, the expected mean distance is way, way off (about 20,000 times smaller).

If you send me your data, I would do the analysis using Python and ArcMap, just to see what results do we get.

Related Question