QGIS – Using Field Calculator Expressions to Retrieve Index of Specific Value

arrayexpressionfield-calculatorqgis

I have two tables:
[see image beloW]

  1. HAK where the Column "Volumenstrom_h" has the input value ( x ) needed for the linear interpolation formula displayed which can be seen surrounded by the black border
  2. RohrdimensionierungsTabelle ( table with no geometry ) hosting the rest of needed variables for the formula.

HowTo

Example of formula at work: For the first value "3573" from the table "HAK" we need the next smallest and next biggest value from the RohrdimensionierungsTabelle.
For this step I managed to write an expression and get the X1 (3240) and X2 (3600) values.

/* X1 aka Xmin = 3240 */

    with_variable(
     'Vstrom',
     "Volumenstrom_h",
    array_max(array_filter(aggregate(
                            layer:='RohrdimensionierungsTabelle', 
                            aggregate:='array_agg', 
                            expression:="Volumenstrom [h]" ),
    @element < @Vstrom))
       )

and

/* X2 aka. Xmax = 3600 */

    with_variable(
     'Vstrom',
     "Volumenstrom_h",
    array_max(array_filter(aggregate(
                            layer:='RohrdimensionierungsTabelle', 
                            aggregate:='array_agg', 
                            expression:="Volumenstrom [h]" ),
    @element < @Vstrom))
       )

Next for finding the Y2 and Y2 I need to look in the RohrdimensionierungsTabelle at the index (@row_number) of X1 and X2 and then shift to the next column untill the values are ≤ 250. In the example above: Y1 (111,4) and Y2 (134,9).
Here is the part where I need help.

Breaking it down:

1.First I need to get the index for the X2 and X1 from the RohrdimensionierungsTabelle.

2.Then get the values Y2 and Y1 at the index positon from the step 1 for the first Column ( 25 x 2,3 [Pa/m] ) and wrap it in a if statement checking if the values are smaller/equal to 250. If not check for the next Column and so on.

3.I think it would be more elegant if I would have another index for calling the columns by their position in the array an not by their name.

In Summary: I am looking for a expression in the HAK table where i can get all the values ( X1,X2,Y1,Y2) from the RohrdimensionierungsTabelle for each value x in the column "Volumenstrom_h"

Best Answer

  • Disclaimer #1: the Field Calculator solution I've given below is particularly convoluted because I've aimed to use one expression that works for all four values (x1, x2, y1, y2).

    It would be simpler with separate expressions for x and y, but I chose to combine them, because I feel the likelihood and consequences of pasting the wrong expression in the wrong column is high enough (at least in my experience!) to warrant writing a universal expression at the expense of readability.

  • Disclaimer #2 : because of the numerous variables and operations involved in parsing/sorting attribute names (see below), storing features etc, a custom Python function would be far more appropriate and elegant as a long-term solution, especially in handling errors.


Expression

To use the expression below with the data you've provided, simply change the value of the variable var_eval at the top of the expression (case-insensitive)

with_variable('var_eval',    --specify variable from equation: x1, x2, y1, y2
'x1',                        --CHANGE VARIABLE VALUE HERE

with_variable('table_name',  --other table name
'RohrdimensionierungsTabelle',

with_variable('v_fld',       --volume flow field in other table
'Volumenstrom [h]',

with_variable('pam_array',   --all pa/m field names from other table, sorted ascending
array_foreach(
    array_sort(
        array_foreach(
            array_filter(map_akeys(attributes(get_feature_by_id(@table_name,1))),
                         @element ilike '%[Pa/m]'),
                    regexp_replace('0'||@element,'^(0)([0-9]{3}|[^0-9])','\\2')
                )
            ),
            regexp_replace(@element,'^0','') --added leading 0 earlier to sort names, now remove 0
        ),

with_variable('idx_ops',     --assign operators for next expr based on equation var (1=prev, 2=next)
map_get(map('1',array('<',-1),'2',array('>',0)),right(@var_eval,1)),

with_variable('idx_ftr',     --look up index (prev/next) feature id in other table, get feature
get_feature_by_id(@table_name,
    eval(format('aggregate(''%1'',
                           ''array_agg'', 
                           $id,
                           "%2" %3 attribute(@parent,''Volumenstrom_h''))[%4]',
                @table_name,@v_fld,@idx_ops[0],@idx_ops[1] --configure lookup expr based on operators
            )
        )
    ),

case                         --this section calculates output value using above variables
when lower(left(@var_eval,1)) = 'x' 
then 
attribute(@idx_ftr,          --if x, get volume flow field value for index feature
          @v_fld)
when lower(left(@var_eval,1)) = 'y'
then 
array_filter(
    array_foreach(@pam_array,
                  attribute(@idx_ftr,@element)
                  ),         --if y, get all pa/m field values for index feature
            @element < 250 and @element != '', 1 
            )[0]                --then filter out empty vals and vals < 250, and keep first value left
else
-1                           --error output
end
))))))

Result

Click to enlarge:

Click to enlarge


A note on data structures

Your post mentions referencing columns (aka fields) in another table by their "position in the array" rather than by their name. I felt the need to clarify that although this is the right concept, in GIS you have to always reference a column by its name.

This means it is not trivial to specify "columns 2 to 12 in that order" ($A:$L in Excel). Here you have to look at the actual names and specify the logic that gets you the result. In this instance it's:

  • all column names containing [Pa/m]
  • ordered by the numeric first portion

The second point bears mentioning; even though map_akeys(attributes()) returns column names in alphabetical order by default, that's not based on numeric portions, so 110 x 10 [Pa/m] comes before 25 x 2,3 [Pa/m]. This required padding an 0 on the required columns to sort the columns appropriately, then taking the 0 out again so the columns return to their original names for further use.

As you can see, although the concept is in the right direction, the actual execution is very different, so keep column names and not positions in mind when structuring your data and finding ways to problem solve. (It was good that your pressure columns had a common suffix! But consider zero-padding the column names or using letters to sort next time. Also, column names starting with numbers are best avoided for technical reasons)