[GIS] Removing and deleting duplicates in FME

deletefields-attributesfmeremovespreadsheet

I'm analysing a large folder containing huge amounts of data, using FME.

I'm filtering it down as much as possible, and I'm a little stuck at this part.

Right now it is just about creating lists. My current spreadsheet has a list filtered by many things, one of which I set for "filetype" to multiple spatial and non-spatial types. One of the filters was "shp" and "dbf".

Now, I know .dbf can come under one of the files making a full shapefile, but it can also come on it's own as a file itself.

What I'd like to do is to take my list and remove all of the dbf's relating to the shapefile.

I have set a workspace that does the following:

Input Spreadsheet —-> Tester for .dbf and .shp —-> Matcher (on filename) —->Output from "Matched" into Tester for .shp —–> Output merged with "NotMatched" from Matcher.

This creates a list with dbf's still involved that are not part of a shapefile, and also gives me all of the shapefiles.

I think this works. My issue is now merging this back in with the original spreadsheet, or on the other hand deleting all of the dbf's from the original.

This could be done manually of course but I need it to run in FME.

I need the final output not to include DBF's that relate to the SHP's.

enter image description here

Best Answer

For various reasons, including performance, I would use a FeatureMerger instead of a Matcher. So I'd set it up as follows:

  • TestFilter instead of Tester, to have DBF and SHP files separated
  • FeatureMerger to carry out a match
    • DBF features are the Requestor, SHP are the Supplier
  • UnmergedRequestor features get written to the output
  • Also written to the output are the TestFilter SHP and Unfiltered features

enter image description here

If you want to write to the same CSV file you read from, then simply add a FeatureHolder before the CSV writer. That will hold up the data so that the reader closes the CSV before the writer tries to open it again.

As @Mapperz mentioned in the comments, you could potentially replace the CSV spreadsheet by using the File/Directory Path reader, which is a format in FME that automatically scans a directory for files. But I suppose that depends on how much the CSV file is a required part of your process. It would mean everything could be done inside FME, which I think is a bonus!