[GIS] Find max and min values from a table based on two fields

arcpyattribute-joinscursor

I have a geodatabase table with the following fields: County_Name, State_Name, Date_Time, Customers

Example

Alameda, CA, 8/1/2014 12AM, 6  
Alameda, CA, 8/1/2014 1AM, 1  
Alameda, CA, 8/1/2014 2AM, 3  
....

Data for all US counties is in the table. I just want to create a new table that contains the

  • county, state, max customers for the day (24 hour period), time of the max customers
  • county, state, min customers for the day (24 hour period), time of the min customers

Many states have the same county names as other states.

Do I need to create a cursor to obtain a list of the unique county and state names first?

This will give me a list of the county names but I also need the state names.

with arcpy.da.SearchCursor(tablesMerged, ["county_nam"]) as cursor:
        print sorted({row[0] for rows in cursor})

Best Answer

Since you said you just need a hint with python I came up with this:

My test table:

testtable

import arcpy

arcpy.env.workspace = "..." #on windows use \\


#SEARCHCURSOR
#SearchCursor (in_table, field_names, {where_clause}, {spatial_reference}, {explode_to_points}, {sql_clause})"""
stable = "test2.shp"
sfield = ["county", "state", "date", "Customers"]


namesdb = set(row[0] for row in arcpy.da.SearchCursor(stable, sfield)) #a set is a collection type which does not allow duplicates
#since there are a lot of duplicates (countynames) we only want each countyname once
#more information on sets can be found here http://www.python-course.eu/sets_frozensets.php

print "Countynames without duplicates: ", namesdb



"""
I only left that in for better understanding what is basically happening in the generator above
The namesdb = set(row.....) replaces all of this:


names = []
namesdb = []

with arcpy.da.SearchCursor(stable, sfield) as scursor: #search for the county names
    for srow in scursor:
        names.append(srow[0]) #append all found countynames to a list


print "duplicates: ", names


for i in names: #get rid of dublicates in statenames
    if i not in namesdb:
        namesdb.append(i) #create new list without duplicates


print "without duplicates: ", namesdb

"""


countylist = {} #create a dict where elements from namesdb are the KEYS
for name in namesdb:
    countylist[name] = []

print "dict: ", countylist


with arcpy.da.SearchCursor(stable, sfield) as scursor:
    for srow in scursor:
        for county in namesdb: #iterate through the namesdb list
            if srow [0] == county: #get the value of each defined column in sfield for each row of countyname
                    countylist[county].append(srow) #append to the matching KEY in dictionary

        else:
            continue


print "dict with values: ", countylist


from operator import itemgetter
#now the trickiest part (for me, cause I am a python beginner, and that really hurt my head). I went to SO to ask a question. See here for better underatanding whats happening: http://stackoverflow.com/questions/25288927/finding-the-max-value-of-list-in-dictionary?noredirect=1#comment39411443_25288927

maxcostumers = {k: max(v, key=itemgetter(-1)) for k, v in countylist.iteritems()} #returns max number of customers, if customers is the last column in your table. If not, adjust the -1
mincostumers = {k: min(v, key=itemgetter(-1)) for k, v in countylist.iteritems()} #returns the min number of customers, as see above


print "row from table with max costumers: ", maxcostumers
print "row from table with min costumers: ", mincostumers

Thats the Output generated:

ountynames without duplicates:  set([u'New_York', u'Jersy', u'Alameda'])

dict:  {u'New_York': [], u'Jersy': [], u'Alameda': []}

dict with values:  {
u'New_York': [(u'New_York', u'NY', datetime.datetime(2014, 8, 13, 0, 0), 10), (u'New_York', u'NY', datetime.datetime(2014, 8, 13, 0, 0), 4), (u'New_York', u'NY', datetime.datetime(2014, 8, 13, 0, 0), 3)], 
u'Jersy': [(u'Jersy', u'JY', datetime.datetime(2014, 8, 13, 0, 0), 6), (u'Jersy', u'JY', datetime.datetime(2014, 8, 13, 0, 0), 7)], 
u'Alameda': [(u'Alameda', u'CA', datetime.datetime(2014, 8, 13, 0, 0), 1), (u'Alameda', u'CA', datetime.datetime(2014, 8, 13, 0, 0), 2), (u'Alameda', u'CA', datetime.datetime(2014, 8, 13, 0, 0), 3), (u'Alameda', u'CA', datetime.datetime(2014, 8, 13, 0, 0), 1)]
}

row from table with max costumers:  {
u'New_York': (u'New_York', u'NY', datetime.datetime(2014, 8, 13, 0, 0), 10),
u'Jersy': (u'Jersy', u'JY', datetime.datetime(2014, 8, 13, 0, 0), 7), 
u'Alameda': (u'Alameda', u'CA', datetime.datetime(2014, 8, 13, 0, 0), 3)
}

row from table with min costumers:  {
u'New_York': (u'New_York', u'NY', datetime.datetime(2014, 8, 13, 0, 0), 3), 
u'Jersy': (u'Jersy', u'JY', datetime.datetime(2014, 8, 13, 0, 0), 6), 
u'Alameda': (u'Alameda', u'CA', datetime.datetime(2014, 8, 13, 0, 0), 1)
}

Then you only need an UpdateCursor and the formatting for the time. Information regarding this you will find here: http://resources.arcgis.com/en/help/main/10.1/index.html#//018w00000014000000 and http://resources.arcgis.com/en/help/main/10.1/index.html#//001700000046000000

Related Question