[GIS] Python – Convert csv with unknown number of columns to shapefile

csvlooppyshppythonshapefile

I have this script that converts a csv file to a shapefile. In this case I am specifying manually the name of columns header. Since now I have files with a big and always different number of columns, I'd like to automatize the script, i.e. making it reading the number of columns and their respective header. Any suggestions?

These are the first 3 line of the csv file:

id  longitude       latitude    D_20150403  D_20150521
1   -16.01746368    17.95987892 11.89926434 15.17788696
2   -16.01752663    17.95999527 16.99330902 15.84250259

    # initialize data lists
ID_n,lon,lat,D_20150403,D_20150521=[],[],[],[],[]

# read data from csv file and store in lists

with open(in_file, 'rt') as csvfile:
    r = csv.reader(csvfile, delimiter=',')
    for i,row in enumerate(r):
        if i > 0:
            ID_n.append(int(row[0]))
            lon.append(float(row[1]))
            lat.append(float(row[2]))
            D_20150403.append(float(row[3]))
            D_20150521.append(float(row[4]))


# set up shapefile writer and create empty fields
w = shp.Writer(shp.POINT)
w.autoBalance = 1
w.field('ID','C')
w.field('Lon','F',10,8)
w.field('Lat','F',10,8)
w.field('D_20150403','F',10,8)          
w.field('D_20150521','F',10,8)      


# loop through the data and write shapefile geometry and attributes
for i,j in enumerate(lon):
    w.point(j,lat[i])
    w.record(ID_n[i],j,lat[i],D_20150403[i],D_20150521[i])

# save output shapefile
w.save(out_shp)

# create output projection file
prj = open(out_prj, "w")
epsg = spatialreference = "GEOGCS['GCS_WGS_1984',DATUM['D_WGS_1984',SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]];-400 -400 1000000000;-100000 10000;-100000 10000;8.98315284119522E-09;0.001;0.001;IsHighPrecision"
    #getWKT_PRJ("4326")
prj.write(epsg)
prj.close()

I tried with this, which seems to replicate the data structure but something is wrong in the creation of the shapefile. Any idea?

# read data from csv file and work on the header

with open(in_file, 'rt') as csvfile:
    rh = csv.reader(csvfile, delimiter=',')
    field_names_list = next(rh)
dim_header = [[]]*len(field_names_list)
dic = dict(zip(field_names_list,dim_header))
print(field_names_list)

n_col = len(field_names_list)
print(n_col)

with open(in_file, 'rt') as csvfile:
    r = csv.reader(csvfile, delimiter=',')
    for i,row in enumerate(r):
        if i > 0:
            for j in range(0, n_col-1):
                dic[field_names_list[j]].append(row[j])

# set up shapefile writer and create empty fields
w = shp.Writer(shp.POINT)
w.autoBalance = 1
#write ID
w.field(dic[field_names_list[0]],'C')
#write all the other fields
for j in range(1, n_col-1):
    w.field(dic[field_names_list[j]],'F',10,8)

# loop through the data and write shapefile geometry and attributes
for i,j in enumerate(dic[field_names_list[0]]):
    w.point(j,dic[field_names_list[0]][i])
    for k in range(0, n_col-1):
        w.record(dic[field_names_list[k]][i])

# save output shapefile
w.save(out_shp)

# create output projection file
prj = open(out_prj, "w")
epsg = spatialreference = "GEOGCS['GCS_WGS_1984',DATUM['D_WGS_1984',SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]];-400 -400 1000000000;-100000 10000;-100000 10000;8.98315284119522E-09;0.001;0.001;IsHighPrecision"
    #getWKT_PRJ("4326")
prj.write(epsg)
prj.close()

Best Answer

For that, the easiest solution is to use GeoPandas (many examples in GIS SE)

1 - read the csv file with Pandas as a DataFrame
2 - compute the geometry with Shapely (many examples in GIS SE)
3- convert the DataFrame to a GeoDataFrame
4 - save the GeoDataframe as a shapefile

import geopandas as gpd
import pandas as pd
# read the csv file
test = pd.read_csv("test.csv")
test.head()
id  longitude   latitude  D_20150403  D_20150521
0   1 -16.017464  17.959879   11.899264   15.177887
1   2 -16.017527  17.959995   16.993309   15.842503
# compute the geometry
from shapely.geometry import Point
test['geometry'] =test.apply(lambda row: Point(row['longitude'], row['latitude']), axis=1)
# convert to a GeoDataFrame
geo_test = gpd.GeoDataFrame(test,geometry=test.geometry)
# save the resulting shapefile
geo_test.to_file("result.shp")