Solved – Defining a reward for regression problems in reinforcement learning

neural networksregressionreinforcement learning

I've started playing a bit with reinforcement learning (in the context of Neural Networks) and I'm having some difficulties with reward functions.

I've seen that for classification problems what people simply do is give a reward of 1 when the classification is correct, and give a reward of 0 when it's wrong. Say that I'm trying to solve a regression problem in which RL is involved (one can think of the MNIST autoencoder for example), how would you define the reward in that case?

The most intuitive thing I can come up with is to look at 1/(norm of the error), so the closer the two vectors are the larger the reward would be, but I'm sure there are much better solutions without the problematic boundaries.

Would be happy to hear a suggestion or get a reference to a related work.

Best Answer

From what I understood about your problem, in such a case I think cosine similarity between the two vectors would be better fit for the task. Due to the (possibly) high-dimensional space where the vector is defined, this measure would, intuitively, take in higher regard the variations alongside all the space dimensions than the norm.

Related Question