Solved – Predicting proportions with Machine Learning

compositional-datamachine learningneural networkssoftmax

I am working on a machine learning problem where I have to predict a set of $N$ numbers (proportions) for each data point, all of them summing to one. One toy example to illustrate my problem would be predicting at a daily level the percentage of volume of water rained in each of the states of the US over the total rain in the country – in this example $N=50$ (the number of states) and $\sum_{n=1}^{50}{\hat{y}_n}=1$

I was thinking on designing a neural net with $N$ outputs and apply a Softmax in the output, then backpropagate the MSE or the RMSE… I am a bit unsure about the convergence guarantees (potential vanishing gradient). I would also like to know if you would approach the problem in another way.

Best Answer

You have what is called . There is quite some literature on how to model this. Take a look through the tag, or search for the term.

Typically, one would choose a reference category and work with log ratios, or similar. One paper I personally know about predicting compositional data is Snyder at al. (2017, IJF). They use a state space approach, not an NN, but their transformation may still be useful to you.

Related Question