[GIS] Counting unique occurrences during Spatial Join

arcgis-10.1arcgis-desktoparcmapmodelbuilderspatial-join

I have a point shapefile (points.shp) with fields [owner], [name], etc.. [owner] and [name] are both of type string. The owner contains a single string, the unique owner. Name contains a list of names, seperated by ";".

[owner]     [name]
   a     ";wood;grass;"
   a     ";grass;house;"
   a     "house"
   b     ";wood;tree;"
   b     ";grass;house;"
   c     ";tree;"

Now, what I am currently doing is aggregating all points of each unique name-string in [name] by a distance and Spatial Joining those back to count the amount of points containing this string within the Aggregated Polygon. The following script is running in Model Builder.

  1. Select all [name] LIKE '%;wood;%'
  2. Aggregate Points by distance z = agg_wood.shp
  3. Spatial Join agg_wood.shp with points.shp (selection still alive)
    = adds a field "Join_Count" to agg_wood.shp
  4. Select [name] LIKE '%;house;%' etc.

The aggregated Polygon Shapes (about 7000) are merged into a final Shapefile. So, my final, merged shapefile will have all counts of [name] occurrences:

shape_wood.shp:
[FID]  [Join_Count]
  1         10        
  2         11        
  3         2         
  4         1         
  5         13        
  6         4         

Finally, I merge all aggregated polygon shapefiles, which works. The point file has something about 800,000 points, about 5,000 unique name strings, and about 20,000 unique owners.

My problem is I would like to add a field to agg_wood.shp during the process that sums the number of unique occurrences for the field [owner] for each aggregated polygon. Lets say there are 5,000 points in an area and 500 of those contain "wood" in the list of strings from the field [name]. These are aggregated into one or more polygons. However, there may be 1 up to 500 different owners who generated "wood". I want to add a field [occ_own] with the number of unique owner-strings who added "wood" in the field [name]. So my intermediate output file (agg_wood.shp) would look like this:

shape_wood.shp:
[FID]  [Join_Count]  [Occ_Own]
  1         10        1
  2         11        1
  3         2         2
  4         1         1
  5         13        13
  6         4         1

shape_house.shp:
[FID]  [Join_Count]  [Occ_Own]
  1         23        3
  2         10        5
  3         3         3
  4         1         1
  5         150       1
  6         2         1

Then I would merge all of those to a single shapefile of all [name] polygons. But I can't imagine how to modify my process so it can calculate the number of unique owners for each polygon area and [name].

Does anyone have an idea?

Best Answer

The problem in your current method, and the reason summarizing afterward as @Branco suggests would not work, is that your spatial join operation creates the first attribute you want (total points per poly) while it destroys/eliminates the second variable (owner) you want to summarize. In order to summarize, you need whatever variables you want in the same dataset. Right now your points have owners and names, and your polygons get a count. You'd need your points to have a polygon name and then you could get owners by name by polygon.

Your data format also introduces a problem because name contains multiple values in a single field and summarizing on that will treat each unique field value as what it counts. In other words, woods;house and house;woods are two different things. So is house and ;house; for that matter. To avoid this, you'll have to use a selection as an input to summarize and not include that field as a case.


Start by modifying and reversing your current spatial join. Instead of points being join features they will be target. Polygons will be the join features. The output of that join will be points with an attribute that is [polygon ID] they fall in.

Now we add some steps to the process. Your spatial join output will become the input for a Summary Statistics tool. But in order to solve the multi-name issue mentioned above, first you'll need to put in/repeat a selection (possibly make feature layer) step to once again grab all points with the desired name string (note now you're working in a new dataset - the spatial join output, not your original point file).

Now you plug that selection/feature layer into a Summary Statistics tool. In there you will add [polygon ID] and [owner] as case fields (note you must add them in that order). You can add any valid statistic field/type you want - we don't need the results of that. The table that is output should then have a list of every unique [owner] and [polygon id] combination along with the [frequency] (or number of times) it occurs. Note the sum total of that frequency column should be the total number of points - so Polygon A has Owner Q frequency three (one row in table), Owner P frequency one (second row in table), and Owner R frequency six (third row in table), and 3+1+6=10 total points in Polygon A.

But you want to collapse that down to one record per polygon, so that output table will now become the input for a second Summary Statistics tool (no selection needed). This time [polygon ID] will be the case field and you'll have two statistics fields - [owner] with type count and [frequency] with type sum. The resulting table should have [polygon ID], [count owner], [sum frequency] and [frequency] (which should equal [count owner]).

That table now gives you the statistics you want for a single name. If you want them as attributes of the polygons, you can join that second Summary Statistics table to the polygons based on [polygon ID] and export the result or use a Join Field tool to append those attributes directly to the original polygon file.

You'll then repeat the entire process for the next [name] string selection, just as in the current step 4 you have. At the end, you'll merge all your polygon shapefiles to a single file.

You could build that all into the model with an iterator and submodel, collect values, and perhaps a dictionary because of that multi-value single-attribute condition of [name]. Otherwise you may want to consider cleaning up that point data so that each point only has a single name value (and those with more than one become stacked points). This could allow direct use of Summary Statistics without any selections, but a selection would still be needed for your aggregate to polygons tool.

Related Question