I have 13000+ CSV files, all containing around 13000 rows. These CSV files are arranged in 146 different folders. All CSV files have the same schema with ;
as separator. These CSV files contain about 175 million rows of data combined. The folders are named as seen in the picture below (for example …\CSV_folders\5785xxx)
CSV files are named like these:
I already tried to copy all CSV files to same directory and run them to one GDB with FME. This went well… but resulted too heavy ESRI GDB that takes forever to search through. (175 million rows, around 20GB)
Now I would need to make things a bit differently. I would like to work with FME to create one ESRI Geodatabase. It should contain tables named like these folders containing CSV files. A table in the GDB should have all the CSV files loaded in the table, that are inside the original folder. This would split my 175 million rows to 146 tables, making it a bit faster to search through with Arcmap.
So my problem is, how can I make FME read CSV files from all of the folders and write a GDB with tables named like the folder, each table containing the same information the CSV files have in the original folder combined? I'm not familiar with PostGIS etc. (and it would not work with ArcGIS), so this needs to be done with some kind of ESRI GDB workaround..
Best Answer
This is a pretty straightforward exercise in FME. You'll need to do the following:
Load your CSVs using a dynamic reader (Single Merged Feature Type) and point to the whole folder where your CSVs are stored. Use the advanced browser to select the whole folder:
Ensure you select search subfolders:
In your workbench, add a FilenamePartExtractor transformer and point it to the fme_dataset attribute. The field you need is
_dirname
._dirname
from the FilenamePartExtractor. The reason I did this was because your folder names start with a number and feature classes in a file geodatabase cannot begin with a number. Also note that if you're using the ArcObjects File Geodatabase writer, you will set your geometry togeodb_table
. If you're using the API writer then you will set your geometry togeodb_no_geom
:You can see that I had folders within the CSV folder called 1, 2 and 3. They wrote to the GeoDatabase as tables called Table_1, Table_2, and Table_3: