@Akhil answered his/her own question but I wanted to post this clearly for posterity in case others find themselves combing through comments to figure this out like I did.
TL/DR answer:
SELECT tabblock_id AS Block,
substring(tabblock_id from 1 for 11) AS Blockgroup,
substring(tabblock_id from 1 for 9) AS Tract,
substring(tabblock_id from 1 for 5) AS County,
substring(tabblock_id from 1 for 2) AS State
FROM tabblock
WHERE ST_Contains(the_geom, ST_SetSRID(ST_Point(-71.101375, 42.31376), 4269))
should return something like:
block | blockgroup | tract | county | state
-----------------+-------------+-----------+--------+-------
250251203014004 | 25025120301 | 250251203 | 25025 | 25
(1 row)
Discussion:
First, understand that the census block contains all the hierarchical information above it. For example, consider the block in @akhil's example. For this, the block is '250251203014004'.
Block: 250251203014004 (Block 4004)
Blockgroup: 250251203014 (Blockgroup 4)
Tract: 25025120301 (Census tract 1203.01)
County: 025 (County FIPS 025)
State: 25 (State FIPS 25)
Not the best example since State and County are both '25' but essentially the whole thing is a concatenation of the individual codes for state > county > tract > blockgroup > block
. Breaking out the above into components makes it clear:
block ~ concat(25,025,1203,01,4004)
Anyway, with that aside, we can use substrings to get everything from the block if you want. See the TL/DR answer above for the solution using substring
.
For reference, here is a more honest approach that doesn't use substring
:
For census block:
SELECT name, tabblock_id
FROM tabblock
WHERE ST_Contains(the_geom, ST_SetSRID(ST_Point(-71.101375, 42.31376) ,4269))`
gives
name | tabblock_id
------------+-----------------
Block 4004 | 250251203014004
(1 row)
For tract and blockgroup:
SELECT statefp, countyfp, tractce, tract_id
FROM tract
WHERE ST_Contains(the_geom, ST_SetSRID(ST_Point(-71.101375, 42.31376), 4269))
gives:
statefp | countyfp | tractce | tract_id
---------+----------+---------+-------------
25 | 025 | 120301 | 25025120301
(1 row)
See that statefp
, countryfp
, and tractce
concatenate to make the tract_id
, which is also the first 11 characters of the census block.
Best Answer
Ok Ben, here are my assumptions:
1) You've already got your data (I had some address points in a shapefile, and I downloaded census tract and census block shapefiles for Missouri).
2) You've already geocoded your address points and you're comfortable projecting the data.
3) You're comfortable with an OGR/PostGIS solution (both free).
Here are some install notes if you don't have these software: How to install PostGREs with PostGIS support. (By BostonGIS. Please don't take offense to their title, I just think it's the best how-to out there.) Also, here's one, two, and three sites describing how to install GDAL/OGR with Python bindings.
Caveat: Before performing the actual analysis (i.e. the
ST_Contains
stuff, below) you should ensure all your layers are in same projection! If you have shapefiles, it's easy to translate from one projection to another using either Quantum GIS (QGIS) or OGR (or ArcGIS if you have it). Alternatively, you could perform the projection transformation in the database using PostGIS functions. Basically pick your poison, or let us know if this is a stumbling block.With those givens, this is how I appended tract and block attriutes to some address points data using PostGIS:
First I used
ogr2ogr
to import the three shapefiles into PostGIS:Import addresses using ogr2ogr:
Import census tracts (Missouri) using ogr2ogr: The
spMoWest
suffix implies I have already translated my data to Missouri State Plane West Feet.Import blocks data (Missouri): This one took awhile. In fact, my computer kept crashing and I had to put a fan on it! Oh also,
ogr2ogr
won't give any feedback, so don't get punchy; make sure to wait on it and it'll eventually finish.Once the data imports are accomplished, launch PgAdmin III (the PostGREs GUI), browse into your database and throw some quick maintenance commands so that PostGREsql will run faster using these new data:
Next, I was curious how many raw address points I imported, so I did a quick
COUNT(*)
. I usually do a count at the start of a task like this to give me a foothold for "sanity checks" later on..In the next phase, I created two new tables, gradually adding the tracts attributes, and then the blocks attributes, to my original address points table. As you'll see, the PostGIS
ST_Contains
function did the heavy-lifting, in each case creating a new table of points, each gaining the attributes of the tracts and blocks polygons they fell inside of.Note! For brevity, I'm only taking a handful of fields from each table. You'll probably want almost everything. I say almost because because you'll need to omit the
ogr_fid
field (maybe even others?) from the tables you're combining, otherwise PostGREs will complain about both fields having the same name..(P.S. I did some snooping around here while figuring this out: http://postgis.net/docs/manual-1.4/ch04.html)
Create a new table of address points with tracts attributes: Note I'm prefixing each output column with a hint disclosing which table it started in (I'll explain why below).
Maintain the table so PostGREs continues to run smoothly:
Now I had two questions..
Did the ST_Contains actually work? ..and.. Does the number of addresses returned make sense given the data inputs I used?
I was able to answer both using the same query:
A quick reflection on the losses: First, I checked in ArcGIS (you could also do this in QGIS) and it returned the same count. So, why the difference? First, some addresses fell outside of Missouri, and I only compared against a Missouri tracts polygon. Second, on closer analysis, it seems there were some examples of bad digitizing in the addresses data. Specifically, many of the points not caught by
ST_Contains
had empty attribute fields, which is a good sign something went foul during digitizing; it also means they weren't useable data anyway. At this point, I'm comfortable with the differences as I could reasonably go back and improve the data, allowing for a cleaner analysis.Moving on, the next step was appending the address/tracts table with attributes from the blocks data. Similarly, I did this by creating a new table, once again prefixing each output field to indicate the table it came from (the prefixing is quite important you'll see):
Of course, maintain the table:
The reason I prefixed each output field was because if I didn't, some fields would have the same names, and it would be impossible to distinguish them from one another in the final product (also.. PostGREs may have complained midway into this, but since I was renaming, I didn't give it the chance). Consider, for instance, the following two fields from both steps, above. You can see why I renamed them..
Now that we have an addresses with tracts and blocks dataset, dwe still have the same number of points?
Yes, we do! If you want, you can go ahead and delete the first table we created,
mcdon_addresses_wtract
. We no longer need it for the analysis.As a last action, you may want to export your data from PostGREs into an ESRI shapefile so that you can view it with other programs, like ArcGIS (of note, QGIS can read the PostGIS data without issue). If you're interested, here's how you could perform the conversion using ogr2ogr:
Finally, when you run this command, you'll likely get some warnings like this:
Warning 6: Normalized/laundered field name: 'tr_statefp10' to 'tr_statefp'
This just means OGR had to shorten that field name, because the field name in a shapefile can only be so long.
Of course, this is only one of many ways to accomplish this job.