Subtract the maximum logarithm from all logs. Throw away all results that are so negative they will underflow the exponential. (Their likelihoods are, for all practical purposes, zero.)
Indeed, if you want a relative precision of $\epsilon$ (such as $\epsilon = 10^{-d}$ for $d$ digits of precision) and you have $n$ likelihoods, throw away any result less than the logarithm of $\epsilon/n$. Then proceed as usual to exponentiate the resulting values and divide each one by the sum of all the exponentials.
For those who like formulas, let the logarithms be $\lambda_1, \lambda_2, \ldots, \lambda_n$ with $\lambda_n = \max(\lambda_i)$. For logarithms to the base $b\gt 1$, define
$$\alpha_i = \cases{
b^{\lambda_i - \lambda_n}, \lambda_i - \lambda_n \ge \log(\epsilon)-\log(n) \\
0\quad \text{otherwise}.}$$
The normalized likelihoods equal $\alpha_i / \sum_{j=1}^n \alpha_j$, $i = 1, 2, \ldots, n.$ This works because replacing all of the otherwise underflowing $\alpha_i$ by zero makes a total error of at most $(n-1)\epsilon/n\lt \epsilon$ whereas, because $\alpha_n=b^{\lambda_n-\lambda_n}=b^0=1$ and all $\alpha_i$ are non-negative, the denominator $A = \sum_j \alpha_j \ge 1$, whence the total relative error due to the zero-replacement rule is strictly smaller than $\left((n-1)\epsilon/n \right) / A \lt \epsilon$, as desired.
To avoid too much rounding error, compute the sum starting with the smallest values of the $\alpha_i$. This will be done automatically when the $\lambda_i$ are first sorted in increasing order. This is a consideration only for very large $n$.
BTW, this prescription assumed the base of the logs is greater than $1$. For bases $b$ less than $1$, first negate all the logs and proceed as if the base were equal to $1/b$.
Example
Let there be three values with logarithms (natural logs, say) equal to $-269647.432,$ $-231444.981,$ and $-231444.699.$ The last is the largest; subtracting it from each value gives $-38202.733,$ $-0.282,$ and $0.$
Suppose you would like precision comparable to IEEE doubles (about 16 decimal digits), so that $\epsilon=10^{-16}$ and $n=3$. (You can't actually achieve this precision, because $-0.282$ is given only to three significant figures, but that's ok: we're only throwing away values that are guaranteed not to affect the better of the precision you want and the precision you actually have.) Compute $\log(\epsilon/n)$ = $\log(10^{-16}) - \log(3)$ = $-37.93997.$ The first of the three differences, $-38202.733,$ is less than this, so throw it away, leaving just $-0.282$ and $0.$ Exponentiating them gives $\exp(-0.282) = 0.754$ and $\exp(0)=1$ (of course). The normalized values are--in order--$0$ for the one you threw away, $0.754 / (1 + 0.754) = 0.430$, and $1/(1+0.754)=0.570$.
In theory, you don't need to normalise your inputs as this is anyway done by the activation function. In practice, however, it's very useful to normalise both input and output tensors for training and testing in the ranges [0,1] or [-1,1] (for regression). After normalisation, you need to back-transform your output in order to make "predictions" on unseen data.
If you want to monitor the error in metres during the training phase then you must use a function to untransform (as you did already).
Best Answer
tl;dr: $RGB$ is the wrong space to do the normalizing in since it is device dependent. You should work with device independent color coordinates like $xyL$. Only in the final step transform to $RGB$. Any per-scene intensity reduction has to be made by scaling the $L$ coordinate from $xyL$ space with a common factor.
For a transparency model based on physics see this paper, especially section 2.2. Follow-up paper 1, follow up paper 2.
Long answer: I think it's necessary to avoid some conceptual confusion regarding the subject matter. Most importantly, color is not a physical property.
Coming to the question of normalizing: With your goal of physically correct rendering, you need to do all your computations per wavelength. With the final result, you then need to apply the cone sensitivity functions to get to $xyL$ space. Then, you need to apply the monitor specs to know which $RGB$ values will produce these $xyL$ triples. For this step, you need to have the specs for a typical LCD (this is what Wolfram alpha must be doing - otherwise the answer would be nonsensical since there is no unique relationship between a device dependent format like $RGB$ and a device independent light).
If some $xyL$ triples cannot be elicited due to monitor restrictions, you need to reduce the intensity of the rendered scene until all $xyL$ triples are inside the monitor gamut. This can be done with a single scalar factor that applies to all $L$ coordinates. This will make the whole scene uniformly darker. Since the 3D-shape of the monitor gamut is weird, this is not a simple normalizing step.