[GIS] Speeding up Python code to convert multiple csv to shapefile

arcpycsvloopperformanceshapefile

I am trying to work with some crime data from https://data.police.uk/data/
The data is organised in several .csv files, one for each month and each crime is geocoded with Lat and Long.

As the file structure might differ from month to month I cannot merge the csv together using
copy *.csv combined.csv in the command prompt as explained here: https://www.itsupportguides.com/office-2010/how-to-merge-multiple-csv-files-into-one-csv-file-using-cmd/

So I decided to use python to loop through all the csv files in the folder and create a shapefile for each one which I will then merge together at a latter stage.

This is the code I came up with after looking at this post Convertion of multiple csv automatically to shp, it works but it is really slow, in a couple of hours it converted only a handful of tables. Do you have any suggestion to improve my code?

I had to use csvfile.replace('-', '_') as the file names look like this 2012-05-metropolitan-street.csv and I cannot use "-" in the output shapefile name.

import arcpy,os
shpworkspace = r"G:\GIS DATA\Crime Data\CSV"
arcpy.env.workspace = shpworkspace
arcpy.env.overwriteOutput = True

csvlist = arcpy.ListFiles("*.csv")

try:
    for csvfile in csvlist:
        outlayer = "CSVEventLayer"
        spatialreference = "GEOGCS['GCS_WGS_1984',DATUM['D_WGS_1984',SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]];-400 -400 1000000000;-100000 10000;-100000 10000;8.98315284119522E-09;0.001;0.001;IsHighPrecision"
        arcpy.MakeXYEventLayer_management(csvfile,"Longitude","Latitude",outlayer,spatialreference,"#")

        shpfile = os.path.splitext(csvfile.replace('-', '_'))[0]
        arcpy.CopyFeatures_management(outlayer,shpfile)
    del outlayer

except:
    # If an error occurred print the message to the screen
    print arcpy.GetMessages()

Best Answer

As indicated in comments, and suggested by most commenters, moving the data from a shared drive to local disk appears to have eliminated the performance of concern:

I finally came back to the office today and tried to move the files on the local machine and re-run the script, it worked! What before was taking hours with the data on the network drive now it took only a couple of minutes.