PyQGIS – Adding New Field with Unique ID Without Iterating Over All Features

pyqgisqgis-3

System: Windows 10

QGIS version 3.28 'Firenze'

I have a line shapefile with a feature count of 188.722.
For each feature I would like to store a unique id in the attribute table with a field named 'main_id'.

However, at the moment, I'm iterating all features which causes QGIS to freeze.

My code so far is:

# import relevant libraries
import os
from qgis.core import *

# get the path to the shapefile e.g. /home/project/data/ports.shp
UBA_network_proj = "full/path/to/shapefile.shp"

# create Qgis vector layer object
vlayer = QgsVectorLayer(UBA_network_proj, "main_roads_layer", "ogr")

# Create and add new empty field to layer attribute table
layer_provider=vlayer.dataProvider()
layer_provider.addAttributes([QgsField("main_id",QVariant.Int)])
vlayer.updateFields()

# Start editing mode, iterate features, get their ids and 
# store it in the newly created field 'main_id', and update the layers attribute table 

vlayer.startEditing()
features=vlayer.getFeatures()
for f in features:
    id=f.id()
    value = id
    attr_value={12:value}
    layer_provider.changeAttributeValues({id:attr_value})
layer.commitChanges()

However, when I iterate the features QGIS starts to freeze for a long time, and I actually haven't tried to even let it run till the end, as I know, that iterating almost 190.000 features cannot be the most performant way to do so.

Is there a more performant way to do this with PyQGIS?

For example, with the processing module.

Best Answer

There are some redundancies in your code, and you are mixing layer editing methods with provider methods, which is not recommended. At the end of the day, I don't think you can avoid feature iteration at some point, but you definitely don't need to make a call to changeAttributeValues() on every iteration.

Try the simplified snippet below. An attributes map can contain as many feature id keys as there are features, with the values being a second dictionary object with a field index as key and new attribute as a value. We can take advantage of this structure by using dictionary comprehension to build a single attribute map, then making a single changeAttributeValues() call at the end of the script, passing in the attributes map. On a test layer (shapefile) with 180500 features, this did the job in around 2-3 seconds.

lyr_path = '/home/ben/test/SHP/test_layer.shp'
vlayer = QgsVectorLayer(lyr_path, 'Main_roads_layer', 'ogr' )
vlayer.dataProvider().addAttributes([QgsField('main_id', QVariant.Int)])
vlayer.updateFields()
fld_idx = vlayer.fields().lookupField('main_id')
atts_map = {ft.id(): {fld_idx: ft.id()} for ft in vlayer.getFeatures()}
vlayer.dataProvider().changeAttributeValues(atts_map)
print('Done')

A few details on the testing, because there may be other variables to consider. QGIS 3.28, Ubuntu 22.04. Desktop machine with 32gb RAM, Ryzen 5 5600G with integrated graphics (no GPU). The shapefile was stored locally. I have noticed at work that saving layer edits is much slower when working on layers stored on a network drive compared to layers stored on the machine.

Related Question