[GIS] FME How to read multiple CSV files from many folders and write to geodatabase tables by folder

csvesri-geodatabasefile-geodatabase-apifme

I have 13000+ CSV files, all containing around 13000 rows. These CSV files are arranged in 146 different folders. All CSV files have the same schema with ; as separator. These CSV files contain about 175 million rows of data combined. The folders are named as seen in the picture below (for example …\CSV_folders\5785xxx)

CSV files are named like these:

I already tried to copy all CSV files to same directory and run them to one GDB with FME. This went well… but resulted too heavy ESRI GDB that takes forever to search through. (175 million rows, around 20GB)

Now I would need to make things a bit differently. I would like to work with FME to create one ESRI Geodatabase. It should contain tables named like these folders containing CSV files. A table in the GDB should have all the CSV files loaded in the table, that are inside the original folder. This would split my 175 million rows to 146 tables, making it a bit faster to search through with Arcmap.

So my problem is, how can I make FME read CSV files from all of the folders and write a GDB with tables named like the folder, each table containing the same information the CSV files have in the original folder combined? I'm not familiar with PostGIS etc. (and it would not work with ArcGIS), so this needs to be done with some kind of ESRI GDB workaround..

Best Answer

This is a pretty straightforward exercise in FME. You'll need to do the following:

Load your CSVs using a dynamic reader (Single Merged Feature Type) and point to the whole folder where your CSVs are stored. Use the advanced browser to select the whole folder:

Ensure you select search subfolders:
In the reader, expose the fme_dataset attribute by right clicking on the reader and clicking on Properties. Then go to the Format Attributes tab:
In your workbench, add a FilenamePartExtractor transformer and point it to the fme_dataset attribute. The field you need is _dirname.
1. Finally, in your writer, you'll want to set your Table name to an expression which includes your _dirname from the FilenamePartExtractor. The reason I did this was because your folder names start with a number and feature classes in a file geodatabase cannot begin with a number. Also note that if you're using the ArcObjects File Geodatabase writer, you will set your geometry to geodb_table. If you're using the API writer then you will set your geometry to geodb_no_geom:

You can see that I had folders within the CSV folder called 1, 2 and 3. They wrote to the GeoDatabase as tables called Table_1, Table_2, and Table_3:

Related Solutions

[GIS] Convert CSV files into OSM with FME

To join the tables use featureMerger using the ID as key match. To create ID's use the counter.

hope it helps Regards Jorge Vidinha

FME – How to Read Files One by One

I would second @Mapperz comment about upgrading - FME 2012 SP2 contains some significant improvements. If you have a current active maintenance contract then you can download the latest version direct from Safe Software.

As is always the case there are several different ways you can achieve the right result using FME. Here are two approaches you could consider:

Approach 1: MS-DOS Batch Script

When you run a workspace from within FME Workbench what you are actually doing is executing the command-line fme executable. Armed with this knowledge it is straight forward to execute a workspace from the command-line, in fact FME Workbench makes it even easier for you as the top of the log in the log viewer panel will display the command-line to use if you want to execute the workspace from a DOS prompt.

Using that information you can create a batch file to scan a directory of files and then send them one-by-one to FME for processing. For example:

SET FME="C:\Program Files\FME\fme.exe"
SET SOURCE_DATA_DIR="D:\Temp"
SET FME_WORKBENCH="YourWorkspace.fmw"
REM Set any other options here

FOR %%A IN (%SOURCE_DATA_DIR%\*.xml) DO (%FME% %FME_WORKBENCH% --SOURCE_FILE %%A)

Obviously you will need to adpat this to your needs but use the output from running your workspace in FME Workbench as a guide.

Approach 2: Controller Workspace

An alternative approach is to use a controller workspace to execute another workspace repeatedly based on some input. In this case the input will be the path to an XML file. This is straight forward to setup, all you need is a Files and Directories reader and a WorkspaceRunner transformer. An example of how this would look in FME Workbench below:

Example of a controller workspace

An extremely useful resource for all things FME is the FMEpedia site. Lots of examples and help doing all maaner of things with FME.

Best Answer

Related Solutions

[GIS] Convert CSV files into OSM with FME

FME – How to Read Files One by One

Related Question