Yes, they need to compute the weights ones but not assigned it to the whole loss. Instead each pixel in the loss (before summing it in both directions) should take a weight. So overall 'weights' is a tensor just like the others. Lets say there are only two classes, and frequency of $ C_1 $ is twofold of $ C_2 $. One of the pixel is corretly predicted as $ C_2 $ with confidence [0.3 0.7] . The loss is $ sum([1, 0].*log[0.3, 0.7]) $. When the weight is included the loss is $ sum([1, 0].*log[0.3, 0.7] * 2) $, because $ C_2 $ should take twice to make a balance. So for each pixel, weight is either 1 or 2 depends on which class it belongs to. This construct a weight matrix. However it can be convenient to think it as tensor because, the weight value correspond to the other class multiplied by 0 in 'true_dist'. In this case the loss for single pixel can be written as $ sum([1, 0].*log[0.3, 0.7].*[2, 1]) $. So it doesn't effect the result. In this way you can make a point-wise multiplication.
PS: It didn't fit to the commment section
EDIT: I can't edit your code because the weight calculation section is not included. If you calculated weights for N classes, then its a 1XN vector. You will construct a 3D array, $ W_{ijk} $, with these weights. The first and second dimension of this array corresponds to 'img_col' and 'img_row' respectively. The third dimension will be a function of 'true_dist', $ T_{ijk}$, at corresponding pixel. I guess you are confused in here, so I will try to be more open at this point. Lets say N is 4 and the weight vector you calculated is denoted as $ w = [w_1,w_2,w_3,w_4]$. The weight values are inversely correlated with frequency of each class'. If a pixel $ (a,b) $ belongs to class $ C_3 $ then T_{ab.} = [0, 0, 1, 0] and $ W_{ab.} = T_{ab.}.*([w_1,w_2,w_3,w_4]) = [0,0,w_3,0]$ where $ .* $ is point-wise multiplication. So only 3rd class' weight value will effect for that individual pixel (a,b). As you see $ W $ is a function of $ T $. What you need to do evaluate $ W $ before passing in to the loss. You can make a function which take $ T $ as input.
The 'weights' in the code is denoted as $ W $ (capital) is a 3D array. $ w $, vector, corresponds to reverse frequency values for each class.
EDIT2: Sorry for the mess I created in here. You don't need to make point-wise multiplication to create $ W_{ijk}$ because it is already done in loss function. So just replicate $ w $ to each pixel of $ W $.
$ \forall (a,b), W_{ab.} = [w_1,w_2,w_3,w_4]$
Best Answer
An easy way to do this would be to simply assign weights so that they upweighted classes all have equal weight to the unweighted largest class. So in your case you would assign a weight of A/B to B, A/C to C, A/D to D and not weight A at all.