[GIS] Estimating resolution of vector data

mathematicsspatial statisticsvectorvectorization

I got an old vector dataset with polygons covering a continent. The data was first published on paper at scale 1:5 000 000 and was later digitalized. I don't have the original data and no information about the vectorization or any metadata. I guess that the distance between the vertices rather than the accuracy limits the resolution.

Vertices are saved with high resolution (e.g. "nnn.nnnnnnnnn","-nn.nnnnnnnnn"). The dataset has few points that can be georeferenced nor any nodes that are defined as coordinates (e.g. at even degrees or UTM coordinates). When I compare some coastline sections, the error is up to +/- 20km.

I'd like to find a formula to estimate the maximum error based on the distribution of the vertices. I have access to any GIS application but would prefer a robust statistical reference.

How can I calculate the maximum error of the dataset, assuming that all vertices are correct? Or phrased differently: What method can I use to find the largest resolution of the dataset?

I tried to rasterize the dataset at different cell sizes and then oversample it to a small cell size to detect the smallest possible rasterizing without loss of resolution, but that is rather time-consuming and not very mathematical approach.

Best Answer

Great question - I’ve seen this type of question pop up numerous times and unfortunately, many people undertaking quantitative GIS analysis ignore the CRITICAL component of calculating uncertainty in spatial datasets. There are important concepts and terminology that needs to be clarified before this type of task can be boiled down to quantitative results.

Calculating error in a spatial dataset assumes prior knowledge of the datasets lineage. As metadata is not available from any step of the process, this type of quantification is not possible. The precision of the coordinates within a vector dataset do not warrant the claim that the dataset is accurate to any degree. Rasterising a dataset will inherit its own degree of error and uncertainty within the data.

Without the metadata and ongoing calculation of error and uncertainty, the dataset can be thought of as a pretty picture. Although it may seem like a simple process to use the scale of the original map and precise nature of the vector polygon coordinates, fundamental concepts of geography will be breached if error and uncertainty is not calculated at every step of the dataset creation from:

  1. original capture of the dataset (error and uncertainty introduced)
  2. paper map creation (generalizations are made)
  3. digitising paper map to digital vector file (more error, more uncertainty)

Although this may not be the answer you are looking for, it is a good place to start for anyone in a similar situation:

  • If you are tasks to calculate a quantitatively accurate representation of uncertainty of a spatial model, I’d suggest researching the topic of “Uncertainty and error propagation in spatial data” as the topic is in-depth, mathematical and statistically dense.

  • If you are using the dataset as a pretty picture, then start mapping.

Related Question