[GIS] WKT: What is the reasoning behind the concept of a POINT EMPTY

geometrypointwell-known-text

In data formats like WKT and WKB, we have ways of representing "empty" geometries. That all pretty much makes sense: a "LINESTRING EMPTY" is a LINESTRING type with 0 vertices. Same with a MULTIPOINT, POLYGON, etc.

But what about POINT EMPTY? For starters, there is no WKB representation for a POINT EMPTY, only WKT. GeoJSON also doesn't define this case in its specification.

According to the Wikipedia definition of a geometric point, "points do not have any length, area, volume, or any other dimensional attribute". So how can it be empty, and why does the WKT specification allow this?

Best Answer

Sorry for reviving an old thread, the same question came up in the context of the geopackage specification so I thought I might as well answer it here too. The reason you need all these empty geometry types is because the ISO SQL/MM part 3 specification requires them (not explicitly, but indirectly).

As an example ST_Intersection needs to be able to return a 'there is no intersection' value. The type of the returned value (what ST_GeometryType will return when called on the value) depends on the argument types. ST_Intersection(ST_GeomFromText('POINT (1 1)'), ST_GeomFromText('POINT (0 0)') for instance is specified in section 5.1.22 as returning 'an empty set of type point'. So this needs to return a valid ST_Geometry value, ST_IsEmpty should return true for it and ST_GeometryType should return ST_Point.

So what should you get when you execute ST_AsText and ST_AsBinary on that returned value? For WKT the only correct answer I think is POINT EMPTY. For WKB this is poorly defined. WKB does not provide an equivalent of WKT's EMPTY and points don't have a natural empty representation. This is why in GeoPackage there's an empty header bit and it's required to use NaN as the coordinates of the point. The intention is to avoid people accidentally using empty point sets as a regular point value. NaN works as an absorbing element, so it's pretty obvious when you've made a mistake since the outcome of your calculations is very likely to be NaN as well.

Note that in many implementations, like PostGIS for instance, the above example will return GEOMETRYCOLLECTION EMPTY instead. That's strictly speaking not compliant with the ISO spec since the type of the return value is not correct.

So to answer the question 'why do you have all these different empty types in WKT and WKB'? Because WKT and WKB are, as far as I know, defined in the SQL/MM part 3 specification, that specification defines distinct typed empty sets for each geometry type and as a consequence WKT and WKB need to have a representation for these typed empty sets.

Related Solutions

Geometry Center – Finding Center of Geometry of an Object

Every polygon has, at a minimum, four distinct "centers":

The barycenter of its vertices.
The barycenter of its edges.
Its barycenter as a polygon.
A GIS-specific "center" useful for labeling (usually calculated with undocumented proprietary methods).

(They may accidentally coincide in special cases, but for "generic" polygons they are distinct points.)

A "barycenter" in general is a "center of mass." The three types differ on where the mass is presumed located: it either is entirely on the vertices, spread uniformly on the edges, or spread uniformly throughout the polygon itself.

Simple methods exist to compute all three barycenters. One approach relies on the basic fact that the barycenter of the disjoint union of two masses is the total-mass-weighted average of the barycenters. From this we easily obtain the following:

The barycenter of two (equally weighted) vertices is their average. This is obtained by averaging their coordinates separately. Geometrically, it is the midpoint of the line segment joining the two vertices.
Inductively, the barycenter of n (equally weighted) vertices is obtained by averaging their coordinates separately.
The barycenter of a line segment is its midpoint. (This is clear by symmetry.)
The barycenter of a polyline is obtained by finding the midpoints of each line segment and then forming their weighted average using the segment lengths as weights.

For example, consider the "L" shape delineated by the points (0,0), (6,0), (6,12). There are two segments: one of length 6 with midpoint at ( (0+0)/2, (0+6)/2 ) = (3,0) and another of length 12 with midpoint at ( (6+6)/2, (0+12)/2 ) = (6,6). Their length-weighted average coordinates are therefore (x,y) with
```
x = (6*3 + 12*6) / (6+12) = 5,  y = (6*0 + 12*6) / (6+12) = 4.
```
This differs from the barycenter of the three vertices, which is ( (0+6+6)/3, (0+0+12)/3 ) = (4,4).

(Edit As another example, consider the figure in the question, which although square in shape, is represented as a pentagon determined by the sequence of points (0,0), (1/2,0), (1,0), (1,1), (0,1). The five sides have lengths 1/2, 1/2, 1, 1, 1 and midpoints (1/4,0), (3/4,0), (1,1/2), (1/2,1), and (0,1/2), respectively. Their weighted average therefore equals
```
[(1/2)*(1/4, 0) + (1/2)*(3/4, 0) + (1)*(1, 1/2) + (1)*(1/2, 1) + (1)*(0, 1/2)] / (1/2+1/2+1+1+1)
= (2/4, 2/4) = (0.5, 0.5)
```
as one would hope, even though the barycenter of the vertices alone (computed as in #2 above) is (0.5, 0.4).)
The barycenter of a polygon can be obtained by triangulation to decompose it into triangles. The barycenter of a triangle-qua-polygon coincides with the barycenter of its vertices. The area-weighted average of these barycenters is the polygon's barycenter. Triangle areas are readily computed in terms of their vertex coordinates (e.g., in terms of the wedge product of two of the sides). For an illustration of such area calculations, including how to exploit signed (positive or negative) areas, see the section on "Area" at my (old) course notes page.

(Edit Consider the polygon depicted in the question for example. We could triangulate it with triangles ((0,0), (1/2,0), (0,1)) on the left, ((0,1), (1/2,0), (1,1)) in the middle, and ((1,1), (1,0), (1/2,0)) on the right. Their areas are 1/4, 1/2, 1/4 respectively and their barycenters--obtained by averaging their vertices--are (1/6,1/3), (1/2,2/3), and (5/6,1/3), respectively. The area-weighted average of these barycenters equals
```
[(1/4)*(1/6,1/3) + (1/2)*(1/2,2/3) + (1/4)*(5/6,1/3)] / (1/4 + 1/2 + 1/4)
= (12/24, 6/12)
= (0.5, 0.5)
```
as it should, despite the presence of that fifth vertex along the bottom edge.)

It is evident that each of these methods is efficient: it requires just a single pass over the "spaghetti" representation of the polygon, using (fairly little) constant time at each step. Note that in all cases except the first (of pure vertices), more information than just a list of vertex coordinates is needed: you need to know the topology of the figure as well. In the "L" example, we needed to know that (0,0) was connected to (6,0) and not to (6,12), for instance.

These are all Euclidean concepts. They can be extended to the sphere (or ellipsoid) in several ways. A straightforward one views the features as a simplicial complex in three (Euclidean) dimensions, computes the appropriate barycenter, and then projects it outward from the center of the ellipsoid back to the surface. This requires no new concepts or formulas; you only have to work with a third (z) coordinate in addition to the first two coordinates. (Areas are still found using lengths of wedge products.)

Another generalization recognizes that the Euclidean metric--the square root of a sum of squares, according to Pythagoras--can be changed to other Lp metrics for p >= 1: you take the pth root of the sum of pth powers. Finding appropriate "barycenters" is no longer so simple, because the beautiful additive properties exploited above (barycenters are weighted averages of barycenters of simpler parts of a figure) no longer hold in general. Often, iterative approximate numerical solutions have to be obtained. They might not even be unique.

Additional centers can be defined for various purposes. Triangles have many different centers that can generalize (somewhat) to polygons: the center of the circumcircle, the center of (some) maximal incircle, the center of a minimum-area bounding ellipse, and others. Any set can be enclosed in various "hulls," such as the convex hull, and the centers of those hulls obtained.

Note that many of these "centers" are not necessarily located within the interior of a polygon. (Any reasonable center of a convex polygon will lie within its interior, though.)

This variety of approaches and solutions indicates one should be wary of a generic term like "center of geometry" or merely "center": it could be just about anything.

[GIS] the difference between ST_GEOMETRY and WKT

WKT is a text interchange format (Wikipedia) (PostGIS), and if you use PostGIS, you might use it like I do occasionally in SQL queries, to get a quick non-visual check on the geometry - see the ST_AsEWKT function.

ESRI seems to use 'ST_Geometry' to be the whole class of 'Spatial SQL' concepts and functions laid out in the OGC/ISO standard, AND use it to be a geometry storage type, e.g. in contrast to their older closed SDEBINARY format. PostGIS seems to use it as a storage type, as you use functions like ST_GeomFromText to populate spatial columns from interchange formats. Though I can't say I've got a clue what the actual storage format in PostGIS is.

Best Answer

Related Solutions

Geometry Center – Finding Center of Geometry of an Object

[GIS] the difference between ST_GEOMETRY and WKT

Related Question