[GIS] Transform polygon to equivalent circle, QGis

circledistancepolygonqgis

I have a polygon and I'd like to find the radius of an equivalent circle from a particular point within the polygon. The polygon would have a large (infinite) number of points evenly distributed accross it. I am working in QGis so if there's a solution for this it would be great.

Or, to put it another way: What is the average distance from a point in a polygon to every other point within the boundary of the polygon? A circle, I believe, would give the shortest distance, but a rectangle with one long edge and one short edge would give the longest distance.

Edit 1:

@Whuber – For your edit 2 this is what I picture. I can't see how this is not the area under b between x=1 and x=0.

enter image description here

Best Answer

The question appears to ask for the mean distance between a point and a polygon. The answer is surprisingly complex. We need to decompose the calculation into simpler ones, as follows:

  1. By definition of "average", to find the average distance between a point z and a polygon P, you compute the total distance and divide by the polygon's area. The "total" distance is the (Riemann) integral of the distance |z - x| as x ranges over P. Call this expression I(z, P).

  2. Express the polygon as a signed linear combination of triangles originating at x. This is straightforward:

    • First decompose the polygon into its connected components.

    • Express each of those as a simply connected polygon minus its "holes". This reduces the situation to calculations for connected, simply connected polygons. I'll continue to call one of them P. Let it have n distinct vertices.

    • The boundary of P is therefore a non-self-intersecting closed polyline determined by a sequence of vertices (p(1), p(2), ..., p(n), p(n+1) = p(1)). The polygon P itself is the sum of oriented triangles P(i) = (z, p(i), p(i+1)), 1 <= i <= n.

    • By the linearity of integration, I(z, P) is the sum of I(z, P(i)) (understanding that it's possible some of these expressions, due to the triangle orientations, will be negative). Note that z is a vertex of each of these triangles.

  3. To find I(z, Q) where z is a vertex of Q, we can express Q as the sum of two oriented right triangles Q(1) and Q(2) (by dropping a perpendicular from z to the opposite side). As before, I(z, Q) = I(z, Q(1)) + I(z, Q(2)). Therefore we may assume z is the vertex of a right triangle. Adjacent to it are a leg and the hypotenuse.

    Triangles ZAC and ZBC

    In this figure Q is the oriented triangle ZAB and z is located at point Z. The perpendicular from Z to the opposite side is ZC, forming the oriented triangle Q(1) = ZAC. We take Q(2) to be the oriented triangle ZCB. Notice that in this figure it has the opposite orientation of ZAC, whence ZAB = ZAC + ZCB. (It's even simpler to check that this formula continues to hold when C lies between A and B.)

  4. Choose a unit of distance in which the leg adjacent to z has unit length. Introduce a coordinate system in which z is at the origin, that leg is in the positive x direction, and the other leg is in the positive y direction. Thus the triangle's vertices are at (0,0), (1,0), and (1,q), for some positive number q. We want to integrate the distance to the origin (z) over this triangle. This is an elementary exercise in integral calculus. The value is

    (q + Sqrt(1 + q^2) + Ln(q + Sqrt(1 + q^2)))/6.

    (The presence of a logarithm may surprise you if you have never contemplated average distances in Euclidean spaces before.) A slightly nicer way to put this is to note that Sqrt(1+q^2) is the length of the hypotenuse so that q + Sqrt(1+q^2) +1 is the perimeter of this triangle. Letting L stand for the perimeter, the formula becomes ((L-1) + Ln(L-1))/6.

I won't provide examples or more details because at this juncture you might start questioning why you want to do this calculation. (However, as a simple application you can now easily confirm the statements made on the MathWorld site concerning average distances in equilateral triangles.)

In many applications, the circle whose area is the same as the original polygon will serve well and is easily found: its radius equals Sqrt(A/pi) for an area of A.

BTW, if you fix the area of the polygon, require z to lie within it, and seek the shape that gives the smallest average distance, the circle is it. But there is no shape that gives the largest average distance. For example, you can enclose z in a rectangle that is arbitrarily long and skinny, placing z at one vertex. The average distance clearly is close to half the length, which is unbounded.

Edit 2

Comments suggest a different interpretation of the problem was intended: it is about polylines rather than polygons. Specifically:

What is the mean distance between a point z and a polyline Q?

The answer to this is obtained with an analysis similar to the preceding: the mean distance equals the integral of the distance over the polyline divided by the polyline's length; the integral of the distance is the sum of the integrals over the polyline's segments.

A generic polyline segment runs from a location u to a distinct location v. It turns out that an easy way to do the calculation is to select a coordinate system in which u is the origin, v is at (1,0), and z has coordinates (x,y), say. (If you view z, u, and v as complex numbers, the new coordinates are the real and imaginary parts of (z - u)/(v - u).) Let a stand for the length of z and b stand for the squared distance from z to (1,0): a^2 = x^2 + y^2 and b^2 = (x-1)^2 + y^2. Compute the following expression and multiply it by the square of the distance from u to v:

[a x - b(x-1) - y^2 (Ln(a - x) - Ln(b - (x-1)))]/2.

If this cannot be computed, that is because z lies along the line from u to v (that is, y = 0); in that case, use the following expression (and multiply it by the square of the distance from u to v):

1/2 if 0 <= x <= 1; x - 1/2 x >= 1; 1/2 - x if x <= 0.

(A compact way to compute this is Max(1/2, |x - 1/2|).)

To obtain the mean distance from z to Q, add these up for all the segments comprising Q and then divide by the length of Q. Here is a picture showing the point z, a closed polyline Q (a decagon), and the circle at the mean distance from z computed using these formulas:

Decagon, circle, and point


Edit

Another approach is to estimate the mean distance by means of a Monte-Carlo simulation. This requires placing N points independently and randomly into the polygon P and averaging their distances to z. Usually, such points are developed using rejection sampling: random coordinates are generated within the polygon's bounding box and retained when they lie within P.

The issue with this approach is that the random sampling limits precision: there will be some statistical error in the answer. Let's analyze it. If the maximum distance from z to P is D and the minimum distance is d (equal to zero when z lies inside P), then the variation in distances to z from random points in P (measured as a standard deviation) cannot exceed (D - d)/2 and is likely to be only around half that, depending on the shape of P and the position of z relative to it. The standard error of the mean distance (call it m) therefore is of the order (D-d)/(2Sqrt(N)). To construct the "equivalent circle," we must use a radius of 3m/2 (because the mean distance to the center of a circle of radius r is 2r/3: another exercise in integral calculus). Therefore the standard error of the circle's radius cannot exceed 3(D-d)/(4Sqrt(N)), which is roughly the diameter of the polygon P.

On a computer screen the polygon's diameter will typically be one hundred to a thousand pixels. To be reasonably sure of getting the size of the "equivalent circle" visibly correct, then, we might ask to reduce the standard error of its radius to less than one pixel. Due to the Sqrt(N) in the denominator, that will require N itself (the number of random points in the simulation) to be around 100^2 = 10,000 to 1000^2 = 1,000,000. That's a lot of random number generation, point-in-polygon testing, and distance calculation. Expect the simulation--and therefore the calculation for a single polygon/point pair--to take seconds to minutes.

Thus, the Monte Carlo approach is feasible for one-off calculations but for large numbers of polygons or polygon/point pairs, you will need to use the analytic approach previously described.