[GIS] Copying attributes from one polygon layer to another

qgis

I have a problem that I can't seem to get my head around. I have two polygon layers:

  • Polygon A – is a subset of polygon B with the same fields and has identical polygons to Polygon B
  • Polygon B – has the attribute data I want to be in Polygon A

How can this be done?

I tried the QGIS tool "Join Attributes by Location" but as some of the polygons are within others, it tends to link to the first intersect it finds (the outer polygon).

Best Answer

@Dano rightly raises some issues that are best addressed in a full reply.

One difficulty, already noted by @Celenius, is that a join between B and A (in either direction) duplicates all the fields; it can be onerous to correct this. I have suggested in comments that the obvious easy way (export to a spreadsheet) raises questions of data integrity. Another difficulty, already addressed by Celenius' proposal, concerns solving this problem when no combination of attributes can serve as a key for both A and B, because that precludes a database join. The spatial join gets around that problem.

What, then, is a good solution? One approach uses A to identify the corresponding records of B containing the desired data. Depending on assumptions about the configurations of the polygons--whether they overlap, whether some can contain others, etc--this can be carried out in various ways: using one layer to select objects in the other, or via joins. The point here is that all we want to do at this stage is select the subset of B corresponding to A.

Having achieved that selection, export the selection and let it replace A. Done.

This solution assumes that all fields in B are intended to replace their counterparts in A. If not, then it really is necessary to perform a 1-1 join of B (source) to A (destination). The join based on identifiers is best, but making a join on polygon identity (Celenius) works fine if ids are not available and there's no chance corresponding polygon shapes in A and B might differ, however slightly. (This is a subtle point, and the potential cause of insidious errors, because previous edits in B to polygons that don't correspond to A could still invisibly modify the other polygons in B if the GIS is "snapping" or "maintaining topology" or otherwise automatically making global changes during local edits.)

At this juncture, there are two copies of every field: if [Foo] is a common field to A and B, then the join contains A.[Foo] and B.[Foo]. Using a field calculation, copy B.[Foo] into A.[Foo]. Repeat for all needed fields. After this is done, remove the join.

Although this procedure can be a little onerous when many fields are involved, its merits include

  • It is straightforward and fast to script.
  • Scripting it leaves an audit trail documenting the processing being done on the data. This is crucial for protecting data integrity.
  • It defends against some kinds of wholesale errors, such as retaining the wrong field after the join (thereby keeping the old data instead of the new data for that field) or deleting a crucial field.
  • It capitalizes on built-in defenses offered by the database management system, such as data type enforcement and business rule enforcement, that operate to prevent and identify errors and to maintain consistency among all the tables and layers in the database.

Some of the guiding principles involved in this suggestion are

  1. Use your a database management system to process data rather than using software not designed or unsuitable for this task.
  2. Avoid changing database structures (such as deleting or adding fields) when operations don't absolutely require it.
  3. Use the software's capabilities for automation to simplify the work, document it, and make the operations reproducible.

One might object that in many cases there are faster and easier ways to reach the same result. Yes, there can be, and they can be effective and usually they work when performed with care. But solutions that risk the data are difficult to recommend and defend as general-purpose answers. They are best employed in one-off situations with small datasets where corruption in the data should rapidly become obvious and the consequences of any such mistakes are immaterial.

Related Question