There are several different measures of "location" or "central tendency". Expected value is the most popular one, but there are others -- median, mode, geometric mean, etc.
While all measures of central tendency are, in some ways, similar, it is important to remember that they actually measure different things. Here are the interpretations.
- Expected value (or mean). Expected value is useful for calculating the total when you have a large number of observations. Say you own a company with a large number ($N$) of employees. The expected salary is $E[X]$. The total salary that you have to pay out is $N E[X]$.
- Median. Median tells you about the typical observation. If you want to know what the typical person earns, look at the median salary, not the expected salary.
- Mode. Mode tells you the most likely outcome. If you are applying for jobs in a particular field, the modal salary is what you will most probably earn, not the expected salary.
There is confusion because, with symmetric distributions, which are common, mean, median, and mode are numerically equal. So people just think of the mean and not the median or mode. But, depending on the problem, what you are actually interested in could be the mode, even if it is numerically equal to the mean.
Now, let's look at your specific questions.
- Suppose each time you throw a die, you earn the amount of money that comes up. After throwing it a lot of times, you will earn $N E[X]$. The "fair price" that someone could charge you for making these throws is $N E[X]$.
- If you assign head and tail to numbers (think: amounts that you will earn or pay), then there certainly is an expected value. For $H = 0$, $T = 1$, $E[X] = 0.5$.
In general, the expectation of $g(X)$ can often be approximated using a Taylor expansion around the mean; let $a=E(X)$
$$g(X) = g(a) + g'(a) (X-a) + \frac{1}{2!}g''(a) (X-a)^2 +\cdots$$
$$E[g(X)] = g(a) + \frac{1}{2!}g^{(2)}(a) \, m_2 + \frac{1}{3!} g^{(3)}(a) \, m_3 + \cdots $$
where $g^{(n)}(a)$ is the $n-$th derivative of $g(X)$ evaluated at the mean, and $m_k$ is the $k-$th centered moment of $X$.
In our case, $g(X)=X \log(X)$ and $g^{(n)}(a) = (-1)^n (n-2)! \; a^{-(n-1)}$ for $n>1$
So the expasion takes the form
$$ E[X \log X] \approx a \log a + \frac{1} {2 \times 1} \frac{m_2}{a} - \frac{1}{3 \times 2}\frac{m_3}{a^2} + \frac{1} {4 \times 3}\frac{m_4}{a^3} -\cdots$$
For the Binomial $(N,p)$, we get
$$ E[X \log X] \approx Np \log( Np) + \frac{1-p}{2} - \frac{(1-p)(1-2p)}{6 Np } + \cdots$$
And for the Hypergeometrix $(N, n, m)$
$$ E[X \log X] \approx a \log(a) + \frac{m}{2 (n+m)} - \cdots$$
where $a=E(X)=\frac{n N}{m+n}$ and i was too lazy to compute the next term.
It's seen that these are useful as asymptotic expansions, for $N \to \infty$.
For finite $N$, this should be not be used if $a \lesssim 1$.
Here are a few values, for the Binomial aproximation up to the third moment:
p=0.2 p=0.5 p=0.8
exact approx exact approx exact approx
N 5 0.4907 0.3200 2.5811 2.5407 5.6542 5.6502
N 10 1.8545 1.7463 8.3123 8.2972 16.740 16.738
N 20 5.9740 5.9252 23.283 23.276 44.463 44.463
As cardinal points out in the comments, using the mean-value form for the error of the truncated Taylor expansion, if we truncate at an odd-moment term, (as I did for the Binomial above) we see that the error must be positive, and hence we have obtained a lower bound of the exact value. This can also be proven using Jensen's inequality, because $g(x)=x \log x$ is a convex function.
Best Answer
For a discrete random variable, $$\text{E}[X] = \sum_{\text{all possible } x} x\,P(X=x)$$.
If you consider the roll of a 6-sided (fair) die, then this is just a weighted arithmetic average. If $N: \text{number of dots face up after a roll}$, then $$\text{E}[N] = 1\frac{1}{6}+2\frac{1}{6}+3\frac{1}{6}+4\frac{1}{6}+5\frac{1}{6}+6\frac{1}{6}$$.
For a continuous random variable, we use the probably density, $f_X(x)$, which is a measure of the intensity (a derivative of a probability) but it is a similar idea.
$$E[X] = \int_{-\infty}^{\infty} x\,f_X(x)dx$$
The intuition is that both are these are conditioning on all possible values of the random variable,$X$, and weighting those possible values with the chance they occur. So, the expected value is an arithmetic mean.
You can compare this mathematically with the geometric mean to see the difference.
If this still isn't clear, feel free to comment.