I want to know whether my understanding for expectation of loss function is correct or not.

(x,y) means sample data-label distribution from true distribution. And function f which is respect to weight w is a mapping function from data x to label dimension. And l means loss function that takes label and predicted label.

so applying loss function to all data points we have, we can create distribution of loss values for all data points. using this distribution, we can calculate expectation value for all data points(say averaged(?) loss value for all data points?). is my understanding about expectation correct or not?

## Best Answer

Yes, you got it correct.

The subscript notation is pretty common in machine learning literature, you can find some examples if it being discusses in the following threads: Expected value notation in GAN loss, Notation: What does the tilde below of the expectation mean?, Notation for expected value, Notation: What does the tilde below of the expectation mean?, What is the meaning of superscript in $p_{\theta}(x)$ and ${\mathbb E}_{\theta}\left[S(\theta)\right]$?.

Notice that it talks about expected value

of the random variable. This is not the same as average of all the datapoints. Average is just an estimate of the expected value, where we cannot observe the true expected value, same as we cannot observe the random variables themselves, just realizations of the random variables.