GeoPandas Pandas – Calculating Time Fields: A Detailed Guide

geopandaspandas

I have a DataFrame that consists of 2 datetime fields I am trying to calculate meters a min so I thought I could use a definition and the df['newfield'] = something method. So I created a definition to do the time math.

def tracktime(starttimestr, endtimestr):
    starttime = datetime.strptime(starttimestr[:19], '%Y-%m-%dT%H:%M:%S')
    endtime = datetime.strptime(endtimestr[:19], '%Y-%m-%dT%H:%M:%S')
    return round((endtime - starttime).total_seconds() / 60, 0)

when I use…

trackgdf['MetersMin'] = trackgdf.Shape_Length/tracktime(trackgdf.TrackStartTime,trackgdf.TrackEndTime)

I got the error expecting string not series so I converted passed them as string with…

trackgdf['MetersMin'] = trackgdf.Shape_Length/tracktime(str(trackgdf.TrackStartTime),str(trackgdf.TrackEndTime))

This returned the error ValueError: time data '0 2019-04-08' does not match format '%Y-%m-%dT%H:%M:%S'
Next I tried writing it out in the field equation as…

trackgdf['MetersMin'] = trackgdf.Shape_Length/round((datetime.strptime(str(trackgdf.TrackEndTime)[:19], '%Y-%m-%dT%H:%M:%S') - (datetime.strptime(str(trackgdf.TrackStartTime)[:19], '%Y-%m-%dT%H:%M:%S')).total_seconds()/60),0)

The error returned is ValueError: time data '0 2019-04-08' does not match format '%Y-%m-%dT%H:%M:%S' but when I print(trackgdf.TrackStartTime) my dates are formatted as "2019-04-16T10:17:56+00:00". So I tried treating them like datetime objects as is with…

trackgdf['MetersMin'] = trackgdf.Shape_Length/round((trackgdf.TrackStartTime - trackgdf.TrackEndTime).total_seconds()/60, 0)

Which returned the error cannot do math on strings; so I tried entering the string from the print statement to see if I got the formatting wrong…

starttime = datetime.strptime(starttimestr[:19], '%Y-%m-%dT%H:%M:%S')

Which returns a datetime object, the question is how do you calculate a column in pandas or geopandas based on time math from 2 other columns?

Best Answer

Try using to_datetime, it is clever and can handle strings with different formats:

import geopandas as gpd
import pandas as pd
from datetime import datetime as dt

df = gpd.read_file(r'/home/bera/Desktop/GIStest/lines_with_2_dates.shp')

#df.columns
#Out[18]: Index(['id', 'date1', 'date2', 'geometry'], dtype='object')

start = dt.now()

# List comrehension variant
# for col in ['date1','date2']:
#     df[col] = [pd.to_datetime(x) for x in df[col]]
   
# df['meters_per_second'] = [x.length / (y-z).total_seconds() for x,y,z in zip(df['geometry'], df['date2'], df['date1'])]

#Apply variant  
df[['date1','date2']] = df[['date1','date2']].apply(pd.to_datetime) #Convert the strings to datetimes
df['meters_per_second'] = df.apply(lambda row: row['geometry'].length / (row['date2']-row['date1']).total_seconds(), axis=1)


stop = dt.now()
elapsed = stop-start
print(elapsed.microseconds)

#15947 with apply
#62673 with list comprehension