[GIS] Obtaining same area values provided by Census for TIGER boundaries using PostGIS functions

areacensuscoordinate systempostgistiger

In PostGIS, how can I compute the area of Census Boundaries and obtain the same results as are provided by the Census?

For example, most Census Boundaries include attributes for Land Area and Water Area (aland and awater, in the 2011 boundaries I'm currently using), which are in square meters.

If I sum these together, should I be able to obtain the same result using the PostGIS ST_Area function, possibly after re-projecting?

The Census TIGER Boundaries are provided in NAD83 / SRID 4269, which uses Degrees as the unit of measurement and as I understand is an unprojected datum. I can compute the area in SRID 4269, but the result will be in Sq Degrees, and I don't believe there is an easy way to convert to Meters as that computation would be different at differing latitudes.

I have tried re-projecting to an equal-area projection – SRID 2163 / US National Atlas Equal, which uses Meters for measurement. However, while the result of the area computation is similar to what is provided by the Census, it is not identical.

SELECT
     aland::BIGINT + awater::BIGINT AS source_area
    ,ROUND(ST_Area(ST_Transform(the_geom,2163))) AS calc_area_2163
FROM raw.tl_2011_county_2011
WHERE statefp = '01'
AND countyfp = '005'    
;

 source_area | calc_area_2163 
-------------+----------------
  2342683631 |     2344419979

I then considered the possibility that the discrepancy was due to the fact that I was testing on very large areas, such as Counties or States. I knew that the Census geographies were hierarchical (at least most of them), so I decided to determine if the provided Census areas values for all Block Groups in a County summed up to the values provided for the County itself. They did. However, even at the Block Group level, the results of my area computation did not match the values from the Census.

ftp://ftp2.census.gov/geo/tiger/TIGER2011/BG/tl_2011_01_bg.zip

SELECT
     aland::BIGINT + awater::BIGINT AS source_area
    ,ROUND(ST_Area(ST_Transform(the_geom,2163))) AS calc_area_2163
FROM raw.tl_2011_bg_2011
WHERE statefp = '01'
AND countyfp = '005'    
AND tractce = '950500'
AND blkgrpce = '1'
;

 source_area | calc_area_2163 
-------------+----------------
    53527310 |       53568242

Should I be using a different projection?

Should I be using the geography data type (which would require an upgrade from the brutally old Postgres/PostGIS versions on this particular box)?

Is the discrepancy due to the fact that the boundaries provided by the Census are more simplified than what they have internally?

CentOS release 5.2 (Final)
PostgreSQL 8.3.6 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20071124 (Red Hat 4.1.2-42)
POSTGIS="1.3.6" GEOS="3.1.0-CAPI-1.5.0" PROJ="Rel. 4.6.0, 21 Dec 2007" USE_STATS (procs from 1.3.2 need upgrade)


In order to rule out my ancient versions of PostGIS and/or PROJ being the issue, I installed current versions of Postgres and PostGIS on a different box, loaded the same TIGER 2011 Block Groups, and ran the same calculations.

New config is:

CentOS release 6.5 (Final)
PostgreSQL 9.3.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit
POSTGIS="2.1.1 r12113" GEOS="3.4.2-CAPI-1.8.2 r3921" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.9.2, released 2012/10/08" LIBXML="2.7.6" LIBJSON="UNKNOWN" TOPOLOGY RASTER

As I suspected, the ST_Area function returns the exact same value on this new config as on the old one.

SELECT
     aland::BIGINT + awater::BIGINT AS source_area
    ,ROUND(ST_Area(ST_Transform(geom,2163))) AS calc_area_2163
FROM raw.tl_2011_bg_2011
WHERE statefp = '01'
AND countyfp = '005'    
AND tractce = '950500'
AND blkgrpce = '1'
;

 source_area | calc_area_2163 
-------------+----------------
    53527310 |       53568242

However, with a current version of PostGIS, I could also try converting to the geography data type and calculating its area.

Additionally, I began to consider that the Census's area calculations were performed using the 124 State Plane Geographic Zones / Coordinate Systems for the U.S., as these would provide more accurate results.

http://en.wikipedia.org/wiki/State_Plane_Coordinate_System

The particular Block Group used in this example is in FIPS 01005 – Barbour County, AL. Using the interactive State Plane map at the link below, I was able to determine that Barbour County is in South-east Alabama and that the correct State Plane is 'Alabama East'.

http://avenza.com/sites/default/files/flashmaps/spsc/

Searching spatial_ref_sys for '%Alabama East%' yielded four results.

2759    PROJCS["NAD83(HARN) / Alabama East",GEOGCS["NAD83(HARN)",DATUM["NAD83_High_Accuracy_Regional_Network",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],TOWGS84[0,0,0,0,0,0,0],AUTHORITY["EPSG","6152"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4152"]],UNIT["metre",1,AUTHORITY["EPSG","9001"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",30.5],PARAMETER["central_meridian",-85.83333333333333],PARAMETER["scale_factor",0.99996],PARAMETER["false_easting",200000],PARAMETER["false_northing",0],AUTHORITY["EPSG","2759"],AXIS["X",EAST],AXIS["Y",NORTH]]
3465    PROJCS["NAD83(NSRS2007) / Alabama East",GEOGCS["NAD83(NSRS2007)",DATUM["NAD83_National_Spatial_Reference_System_2007",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],TOWGS84[0,0,0,0,0,0,0],AUTHORITY["EPSG","6759"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4759"]],UNIT["metre",1,AUTHORITY["EPSG","9001"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",30.5],PARAMETER["central_meridian",-85.83333333333333],PARAMETER["scale_factor",0.99996],PARAMETER["false_easting",200000],PARAMETER["false_northing",0],AUTHORITY["EPSG","3465"],AXIS["X",EAST],AXIS["Y",NORTH]]
26729   PROJCS["NAD27 / Alabama East",GEOGCS["NAD27",DATUM["North_American_Datum_1927",SPHEROID["Clarke 1866",6378206.4,294.9786982139006,AUTHORITY["EPSG","7008"]],AUTHORITY["EPSG","6267"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4267"]],UNIT["US survey foot",0.3048006096012192,AUTHORITY["EPSG","9003"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",30.5],PARAMETER["central_meridian",-85.83333333333333],PARAMETER["scale_factor",0.99996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],AUTHORITY["EPSG","26729"],AXIS["X",EAST],AXIS["Y",NORTH]]
26929   PROJCS["NAD83 / Alabama East",GEOGCS["NAD83",DATUM["North_American_Datum_1983",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],TOWGS84[0,0,0,0,0,0,0],AUTHORITY["EPSG","6269"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4269"]],UNIT["metre",1,AUTHORITY["EPSG","9001"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",30.5],PARAMETER["central_meridian",-85.83333333333333],PARAMETER["scale_factor",0.99996],PARAMETER["false_easting",200000],PARAMETER["false_northing",0],AUTHORITY["EPSG","26929"],AXIS["X",EAST],AXIS["Y",NORTH]]

I noticed that the third of these uses US Survey Feet as units, so after reading this documentation regarding Feet and US Survey Feet to Meter conversions, I added a conversion factor of 1 US Survey Foot^2 = 0.0929034116132 Meter^2 (0.3048006096012^2).

http://www.wsdot.wa.gov/reference/metrics/foottometer.htm

Rather than dig through the details of these projections, I decided to simply try all four to determine if any matched the Census's figures. Unfortunately, none match.

SELECT
     aland::BIGINT + awater::BIGINT AS source_area
    ,ROUND(ST_Area(ST_Transform(geom,2163))) AS calc_area_2163
    ,ROUND(ST_Area(geom::geography)) AS calc_area_geog
    ,ROUND(ST_Area(ST_Transform(geom,2759))) AS calc_area_2759
    ,ROUND(ST_Area(ST_Transform(geom,3465))) AS calc_area_3465
    ,ROUND(ST_Area(ST_Transform(geom,26729)) * 0.0929034116132) AS calc_area_26729
    ,ROUND(ST_Area(ST_Transform(geom,26929))) AS calc_area_26929
FROM raw.tl_2011_bg_2011
WHERE statefp = '01'
AND countyfp = '005'    
AND tractce = '950500'
AND blkgrpce = '1'
;
 source_area | calc_area_2163 | calc_area_geog | calc_area_2759 | calc_area_3465 | calc_area_26729 | calc_area_26929 
-------------+----------------+----------------+----------------+----------------+-----------------+-----------------
    53527310 |       53568242 |       53527329 |       53527861 |       53527861 |        53527014 |        53527861

Still stumped.



The document below, published by the Census in 1994, outlines the history of their area calculation methodology and seems appears to indicate that area calculations for larger geographies (States, Counties, etc) are based on the sum of calculations for their component entities.

http://www.census.gov/geo/reference/pdfs/GARM/Ch15GARM.pdf

While I had descended from County to Block Group in the Census hierarchy, there was obviously more level to go, the Block.

ftp://ftp2.census.gov/geo/tiger/TIGER2011/TABBLOCK/

Thus, to finish this line of research, I downloaded the Blocks for Alabama from the link above and loaded to my new PostGIS instance. First, I decided to verify that the source land area and water area values provided by the Census for all of the Blocks in Block Group 010059505001 actually summed up to the values provided by the Census for the Block Group itself.

SELECT
     SUM(aland::BIGINT + awater::BIGINT) AS source_area
FROM raw.tl_2011_tabblock_2011
WHERE statefp = '01'
AND countyfp = '005'    
AND tractce10 = '950500'
AND blockce10 LIKE '1%'
;    
 source_area 
-------------
    53527310

This sum matches the value on the Census Block Group record itself (see previous query results above). Next, it was time to calculate the area of these Census Blocks using the same variety of projections used for the Block Group calculations – US Equal Area projection, PostGIS geography, and various Alabama East State Plane geographic zones. I assumed that some of these Census Blocks might be extremely tiny, and thus the calculations may match only due to rounding, but potentially not match for the larger Blocks, thus I ordered by the reported Area.

SELECT
     name
    ,aland::BIGINT + awater::BIGINT AS source_area
    ,ROUND(ST_Area(ST_Transform(geom,2163))) AS calc_area_2163
    ,ROUND(ST_Area(geom::geography)) AS calc_area_geog
    ,ROUND(ST_Area(ST_Transform(geom,2759))) AS calc_area_2759
    ,ROUND(ST_Area(ST_Transform(geom,3465))) AS calc_area_3465
    ,ROUND(ST_Area(ST_Transform(geom,26729)) * 0.0929034116132) AS calc_area_26729
    ,ROUND(ST_Area(ST_Transform(geom,26929))) AS calc_area_26929
FROM raw.tl_2011_tabblock_2011
WHERE statefp = '01'
AND countyfp = '005'    
AND tractce10 = '950500'
AND blockce10 LIKE '1%'
ORDER BY aland::BIGINT + awater::BIGINT ASC
;
    name    | source_area | calc_area_2163 | calc_area_geog | calc_area_2759 | calc_area_3465 | calc_area_26729 | calc_area_26929 
------------+-------------+----------------+----------------+----------------+----------------+-----------------+-----------------
 Block 1081 |         439 |            440 |            439 |            439 |            439 |             439 |             439
 Block 1052 |         620 |            620 |            620 |            620 |            620 |             620 |             620
 Block 1008 |         654 |            655 |            654 |            654 |            654 |             654 |             654
 Block 1071 |         910 |            910 |            910 |            910 |            910 |             910 |             910
 Block 1003 |        1342 |           1343 |           1342 |           1342 |           1342 |            1342 |            1342
 Block 1038 |        1352 |           1353 |           1352 |           1352 |           1352 |            1352 |            1352
.
.
.
 Block 1016 |     2010716 |        2012246 |        2010693 |        2010769 |        2010769 |         2010737 |         2010769
 Block 1087 |     2138937 |        2140590 |        2138949 |        2138918 |        2138918 |         2138895 |         2138918
 Block 1083 |     2200039 |        2201741 |        2200039 |        2200029 |        2200029 |         2199998 |         2200029
 Block 1017 |     3016079 |        3018362 |        3016083 |        3016144 |        3016144 |         3016097 |         3016144
 Block 1000 |     4598612 |        4602068 |        4598637 |        4598729 |        4598729 |         4598658 |         4598729
 Block 1076 |    22454205 |       22471479 |       22454208 |       22454290 |       22454290 |        22453910 |        22454290
(109 rows)

As I expected, the ST_Area calculations for extremely small Census Blocks occasionally matched, but not for larger Blocks. The calculations were close, but not exact. I realize that some of the discrepancies are only a few hundred or thousand square meters, but that's a real discrepancy and those values add up when summing to larger and larger geographies.

Still stumped.



Just to flush this out further, I downloaded the 2013 TIGER Blocks instead of 2011 and performed the same calculations, in case anything had changed in the Census's calculation, projection, geodetic implementation, etc. There was no change in the Census land and water area values between 2011 and 2013, at least for this small sample set.

Further, I also tried the appropriate UTM Zone projection – 17N – SRID 3724, for the heck of it.

SELECT
     aland::BIGINT AS source_land_area
    ,awater::BIGINT AS source_water_area
    ,aland::BIGINT + awater::BIGINT AS source_area
    ,ROUND(ST_Area(ST_Transform(geom,2163))) AS calc_area_2163                      -- US Equal Area
    ,ROUND(ST_Area(geom::geography)) AS calc_area_geog                              -- PostGIS Geography
    ,ROUND(ST_Area(ST_Transform(geom,2759))) AS calc_area_2759                      -- US State Plane - Alabama East NAD83(HARN)
    ,ROUND(ST_Area(ST_Transform(geom,3465))) AS calc_area_3465                      -- US State Plane - Alabama East NAD83(NSRS2007)
    ,ROUND(ST_Area(ST_Transform(geom,26729)) * 0.0929034116132) AS calc_area_26729  -- US State Plane - Alabama East NAD27 with US Survey Feet conversion facto
    ,ROUND(ST_Area(ST_Transform(geom,26929))) AS calc_area_26929                    -- US State Plane - Alabama East NAD83
    ,ROUND(ST_Area(ST_Transform(geom,3724))) AS calc_area_3724                      -- UTM Zone 17N
FROM raw.tl_2013_tabblock_2013 b
WHERE statefp = '01'
AND countyfp = '005'    
AND tractce10 = '950500'
AND blockce10 LIKE '1%'
ORDER BY aland::BIGINT + awater::BIGINT ASC
;
 source_land_area | source_water_area | source_area | calc_area_2163 | calc_area_geog | calc_area_2759 | calc_area_3465 | calc_area_26729 | calc_area_26929 | calc_area_3724 
------------------+-------------------+-------------+----------------+----------------+----------------+----------------+-----------------+-----------------+----------------
.
.
          2200039 |                 0 |     2200039 |        2201741 |        2200039 |        2200029 |        2200029 |         2199998 |         2200029 |        2207094
          3016079 |                 0 |     3016079 |        3018362 |        3016083 |        3016144 |        3016144 |         3016097 |         3016144 |        3025195
                0 |           4598612 |     4598612 |        4602068 |        4598637 |        4598729 |        4598729 |         4598658 |         4598729 |        4612402
         22454205 |                 0 |    22454205 |       22471479 |       22454208 |       22454290 |       22454290 |        22453910 |        22454290 |       22524848
(109 rows)

Other Notes

  • this appears to confirm that TIGER boundaries provided by the Census are not generalized in any way.

http://www.census.gov/geo/maps-data/data/tiger.html

  • this specifies that "Area is calculated from the specific boundary recorded for each entity in the Census Bureau's geographic database", which contradicts my previous assumption that area values for larger entities were calculated by summing the areas of their component, lower-level entities.

http://www.census.gov/geo/reference/gtc/gtc_area.html


Due to @Paul's response and my inability to let this go, I created a new CentOS 6.5 VM, downloaded and installed Oracle 11g R2 Express Edition, and then used Oracle's Java Shapefile Converter to load the 2013 Census Blocks for Alabama into the database. This was somewhat painful as I haven't touched an Oracle DB in 5 years, so if anything below looks incorrect, feel free to point it out.

Below is the command I ran to use Oracle's Java Shapefile Converter to load the data into the DB. I wasn't entirely sure what to use for the tolerance parameter.

java -cp $clpath oracle.spatial.util.SampleShapefileToJGeomFeature -h localhost -p 1521 -s xe -u pete -d mypassword -t tl_2013_tabblock_2013 -f tl_2013_01_tabblock -r 4269 -g geom -o 0.000000001

The load worked like a charm. Finally the query to find the area. Included are the Block Name, the Census values for land area, water area, and total area, the calculated area using Oracle's SDO_GEOM.SDO_AREA function, the absolute value of the difference between the Census's reported value and the calculated value, again ordered by total size of the Block.

SELECT
     name
    ,aland
    ,awater
    ,CAST(aland AS NUMBER) + CAST(awater AS NUMBER) AS sourc_total_area
    ,ROUND(SDO_GEOM.SDO_AREA(geom,.000000001, 'unit=SQ_M'),1) AS calc_area
    ,ABS((CAST(aland AS NUMBER) + CAST(awater AS NUMBER)) - ROUND(SDO_GEOM.SDO_AREA(geom,.000000001, 'unit=SQ_M'))) AS area_diff
FROM tl_2013_tabblock_2013
WHERE statefp = '01'
AND countyfp = '005'    
AND tractce10 = '950500'
AND blockce10 LIKE '1%'
ORDER BY (CAST(aland AS NUMBER) + CAST(awater AS NUMBER))
;
  NAME         ALAND     AWATER   SOURCE_TOTAL_AREA CALC_AREA  AREA_DIFF
----------- ---------- ---------- ----------------- ---------- ---------
Block 1081         439          0               439      439.3         0
Block 1052         620          0               620      619.7         0
Block 1008         654          0               654      654           0
Block 1071         910          0               910      909.7         0
Block 1003        1342          0              1342     1341.9         0
Block 1038        1352          0              1352     1352.2         0
Block 1041        1899          0              1899     1898.7         0
.
.
.
Block 1032     1329165          0           1329165  1329165           0
Block 1085     1476381          0           1476381  1476380.2         1
Block 1036     1485921          0           1485921  1485921.2         0
Block 1042           0    1659345           1659345  1659345.4         0
Block 1016           0    2010716           2010716  2010715.9         0
Block 1087     2138937          0           2138937  2138936.7         0
Block 1083     2200039          0           2200039  2200038.6         0
Block 1017     3016079          0           3016079  3016078.5         1
Block 1000           0    4598612           4598612  4598612.1         0
Block 1076    22454205          0          22454205 22454201.3         4

The Oracle calculated values match for nearly all Blocks in this Block Group. The few that don't match differ by a single square meter, except the last Block, which differs by 4 square meters. This final Block is obviously significantly larger than any other Block in the Block Group – by 500%. I am still puzzled as to why these values don't match those supplied by the Census. I suppose that the Census may be calculating the area of smaller components of the Census Block (if such things exist) that are not published externally, and summing those values to the Census Block level. The area calculated by Oracle is 4 square meters smaller than the value reported by the Census, while PostGIS's calculation is 3 square meters larger.



After reading further Census TIGER documentation here:

http://www.geos.ed.ac.uk/~gisteac/proceedingsonline/Oracle10%20Articles/articles/us_census.pdf

(see the Features and Hierarchical Features section), it appears that all TIGER Features are comprised of topological primitives – Faces. As such, it is likely that the Area values published for Features are simply the sum of Area values for their component Faces. Thus, the tests I previously performed for Counties, Block Groups, and Blocks should really have been used on Faces.

Thus, I downloaded the Faces file for Barbour County, Alabama (FIPS 01005) from:

http://www2.census.gov/geo/tiger/TIGER2013/FACES/

and loaded to Oracle 11g R2 Express Edition in a table named tl_2013_faces_2013. Overall, there are 2,620 Faces in this County and we were able to reproduce the Area value published by the Census using Oracle's SDO_GEOM.SDO_AREA function for all Faces except 5.

SELECT
     CASE
         WHEN CAST(atotal AS INTEGER) = ROUND(SDO_GEOM.SDO_AREA(geom,.000000001, 'unit=SQ_M')) THEN 'Equal'
         ELSE 'Not Equal'
     END AS compare
     ,COUNT(*)
FROM tl_2013_faces_2013
GROUP BY
     CASE
         WHEN CAST(atotal AS INTEGER) = ROUND(SDO_GEOM.SDO_AREA(geom,.000000001, 'unit=SQ_M')) THEN 'Equal'
         ELSE 'Not Equal'
     END
;
COMPARE     COUNT(*)
--------- ----------
Equal       2615
Not Equal      5

Below are the details for the 5 Faces that don't match:

SELECT
     tfid
    ,atotal
    ,ROUND(SDO_GEOM.SDO_AREA(geom,.000000001, 'unit=SQ_M')) AS calc_area_rounded
    ,SDO_GEOM.SDO_AREA(geom,.000000001, 'unit=SQ_M') AS calc_area
FROM tl_2013_faces_2013
WHERE CAST(atotal AS INTEGER) != ROUND(SDO_GEOM.SDO_AREA(geom,.000000001, 'unit=SQ_M'))
;
      TFID     ATOTAL CALC_AREA_ROUNDED  CALC_AREA
---------- ---------- ----------------- ----------
 261070759   10451294          10451293 10451293.2
 216313144   10274098          10274099 10274098.8
 216311613   14069814          14069815 14069814.6
 216311321    3436895           3436894  3436894.5
 259316991   12203147          12203148 12203147.5

Anyone still reading at this point might note some seemingly strange rounding that is occurring (such as 3436894.5 rounding to 3436894). I assume this is because the calculated Area is actually something like 3436894.47 but is simply being displayed as 3436894.5 in the query output. I haven't been able to determine how to force the display of additional precision (I've tried casting a NUMERIC(15,5), NUMERIC, adding 0.000, etc).

Anyway, it is still a bit unclear as to why we cannot reproduce the same Area values as published by the Census for these Faces. At this point, my assumption is this is due to the fact that either (i) we are using a different version of Oracle Spatial / Locator and associated libraries than was used by the Census when calculating these Area values, or (ii) the Census has a slightly higher degree of precision for lat/lon values in their internal database than is published in the TIGER shapefiles.

Finally, I wanted to confirm that the area values published for higher level Census Features were simply the sum of the area values for their component Faces. I used Faces and Blocks for Barbour County, Alabama to confirm this. Everything matches.

SELECT
     f.statefp
    ,f.countyfp
    ,f.tractce10
    ,f.blockce10
    ,CAST(b.aland AS INTEGER) + CAST(b.awater AS INTEGER) AS block_area
    ,SUM(CAST(f.atotal AS INTEGER)) AS face_area_sum
FROM tl_2013_faces_2013 f
INNER JOIN tl_2013_tabblock_2013 b
ON (f.statefp = b.statefp
AND f.countyfp = b.countyfp 
AND f.tractce10 = b.tractce10
AND f.blockce10 = b.blockce10)
GROUP BY
     f.statefp
    ,f.countyfp
    ,f.tractce10
    ,f.blockce10
    ,CAST(b.aland AS INTEGER) + CAST(b.awater AS INTEGER)
HAVING CAST(b.aland AS INTEGER) + CAST(b.awater AS INTEGER) != SUM(CAST(f.atotal AS INTEGER))
ORDER BY
     f.statefp
    ,f.countyfp
    ,f.tractce10
    ,f.blockce10
;

no rows selected

In addition, I received a response from the Census's Geography Division regarding this topic:

I spoke with our subject matter expert and here is his reply:

We use Oracle's SDO_GEOM.SDO_AREA method to calculate area for each of
the smallest denominator polygons (faces) in the MAF/TIGER database.
To determine the land and water area values for an entity, we
aggregate the areas of all the faces defining the entity. Instead of
using any projection, SDO_AREA employs Girard's theorem on spherical
excess angle after transforming the geodetic geometries to the
authalic sphere (conversion of geodetic to authalic latitude).

I hope this addresses your question, but please let us know if you
have any additional questions

Best Answer

Very interesting question! I'd note that the geography area calculation is extremely close (in your last example, 3 msq difference over an area of 20000000 msq. (0.000015%)

Since we know that census manages its data inside Oracle Spatial, I'd guess that what you're seeing is a very small difference in the implementation of geodetic area calculation between Oracle and PostGIS. Certainly the very very very similar results would lead me to believe that to be the case.

The way to test would be to take one of your larger polygons and import it into Oracle and compare the geodetic area calculation to the PostGIS one. If you see the same difference, then mystery solved. You can do this if you're motivated (and judging from the work you've put in so far, perhaps you are!) by downloading the free Oracle Express edition and running the calculation in there.

Related Question