$P_i = \mid\psi_i\rangle\langle\psi_i\mid$ is the one-dimensional "projection" operator. By "one-dimensional" it means this projection operator projects $\psi$ onto a single dimension in Hilbert space.
Firstly, any wavefunction $\psi$ can be written as a linear combination of orthogonal components. That is, $\psi = \sum a_i\mid\psi_i\rangle$ where $a_i$ is some coefficient. If there are $n$ such non-zero coefficients, $\psi$ can be thought of as a vector in $n$ dimensions, having components in each direction of length $a_i$ in this $n-dimensional$ Hilbert space. $a_i$ is also the amplitude that the result $\psi_i$ will be obtained if the wave-function is measured in this basis. The probability is amplitude^2.
The projection measurement essentially "projects" the state $\psi$ onto one of these components.
It is easiest to demonstrate why $P_i = \mid\psi_i\rangle\langle\psi_i\mid$ by applying it to the state $\psi$.
$P_i\psi = P_i\sum a_k\mid\psi_k\rangle = \mid\psi_i\rangle\langle\psi_i\mid\sum a_k\mid\psi_k\rangle = \mid\psi_i\rangle\sum a_k\langle\psi_i\mid\psi_k\rangle = a_i\mid\psi_i\rangle$
Therefore, the operator $P_i$ acting on some arbitrary state $\psi$, projects $\psi$ onto its i-th component vector. (This is analogous to projecting a 2D vector onto say its x-component in Euclidean geometry. For instance if a vector $V = ax + by$, then the projection onto x-axis would yield $V_x = ax$)
So since, $P_i\mid\psi\rangle=a_i\mid\psi_i\rangle$ we can easily show that $\langle\psi\mid P_i\mid\psi\rangle=\sum \langle\psi_k\mid a_k^*a_i\mid\psi_i\rangle = \sum a_k^*a_i\langle\psi_k\mid\psi_i\rangle = a_ia_i^* = \mid a_i\mid^2$
Therefore we have shown that $\langle\psi\mid P_i\mid\psi\rangle$ gives the probability of the wavefunction being in the eigen-state $\psi_i$.
The next step is to show how the one in your book is also the probability. Note your book's use of $P_x$ is to represent probability and is not the projection operator. First consider the denominator $\langle\psi\mid\psi\rangle = \sum^j\sum^k\langle\psi_j\mid a_j^* a_k\mid\psi_k\rangle$. The only terms that survive is when i=j. Therefore we arrive at:
$\langle\psi\mid\psi\rangle = \sum^j\langle\psi_j\mid a_j^* a_j\mid\psi_j\rangle = \sum^j a_j^*a_j = \sum^j \mid a_j\mid ^2 $ This is the total probability of any state which, if this is normalized, should be 1. Therefore $\langle\psi\mid\psi\rangle = 1$ for normalized states, otherwise it is the sum of all possible amplitudes^2.
Next we consider the numerator. This is the dot product of the x-components of the state, which will yield $a_x^* a_x = \mid a_x \mid ^2$ Therefore, numerator over denominator gives $\frac{\mid a_x \mid ^2}{\sum^j \mid a_j\mid ^2}$. This is the probability for a particular state x to occur divided by the probability that any of the possible states will occur (which should be 1 for normalized states).
Born's rule is correct. Your measurement result will be $E_1$ or $E_2$,
but not $E_\text{average}$.
It is like rolling a die.
You get a $1$, $2$, $3$, $4$, $5$ or $6$. But you never get a $3\frac 12$.
Best Answer
The correct statement is that the probability that a measurement of an observable represented by a Hermitian operator $A$ (with non-degenerate spectrum) over a state $\vert\psi\rangle$ would yield an eigenvalue $\lambda_i$ is given by
\begin{align}p_i=\frac{\langle\psi\vert\psi_i\rangle\langle\psi_i\vert\psi\rangle}{\langle\psi\vert\psi\rangle}\end{align} where $\vert\psi_i\rangle$ is the normalized eigenstate of the operator $A$ corresponding to the eigenvalue $\lambda_i$. However, this does not require that $\vert\psi\rangle=\sum_i\vert\psi_i\rangle$. The state vector $\vert\psi\rangle$ can be the most generic normalizable state and thus, would be represented, in general, as a generic linear combination $\vert\psi\rangle=\sum_ic_i\vert\psi_i\rangle$ where $c_i\in\mathbb{C}$.
This statement is called the Born rule.
It is needed to be supplied with a closely related axiom that goes by the name of the collapse postulate or the wavepacket reduction postulate to give a "complete" picture of what happens when you perform a measurement. It says that the aforementioned measurement evolves the state $\vert\psi\rangle$ to an eigenstate $\vert\psi_i\rangle$ corresponding to the outcome $\lambda_i$.
All of this can be made a bit more general to take care of measurements of operators with degenerate spectra using the projection operators, but the basic idea is already captured here. In the case of the measurement of an operator $A$ with distinct eigenvalues $\lambda_i$ such that $A=\sum_i\lambda_i\mathbb{P}_i$ where the $\mathbb{P}_i$s are the projection operators corresponding to the $i^\mathrm{th}$ eigensubspace, the probability of the outcome of the measurement yielding $\lambda_i$ is given by
\begin{align}p_i=\frac{\langle\psi\vert\mathbb{P}_i\vert\psi\rangle}{\langle\psi\vert\psi\rangle}\end{align}
The wavepacket reduction postulate now says that the aforementioned measurement evolves the state $\vert\psi\rangle$ to the state $\frac{\mathbb{P}_i\vert\psi_i\rangle}{\langle\psi\vert\mathbb{P}_i\vert\psi\rangle}$ corresponding to the measurement outcome being $\lambda_i$. Notice that the denominator here is needed to ensure that the resultant state is normalized.
In standard textbook quantum mechanics, both of these are always, as far as I know, taken to be basic axioms. One can formulate their quantum mechanics using a different mathematical formalism but they still have to provide some translation of these axioms as axioms in their framework as well -- as long they really are just another formulation of the standard textbook quantum mechanics in their physical content.
Having said that, there have been attempts, starting in 1957 and continuing to this day, to derive the Born rule. There have been mainly three approaches to attempt the derivation:
Measure-Theoretic/Frequentist Approaches
Symmetry-Based Approaches
Decision-Theoretic Approaches
Now, none of these attempts have been accepted, at least so far, by the community as true derivations of the Born rule. Basically, in standard quantum mechanics, there is no plausible way to do away with the wave-packet reduction axiom (which ought to accompany the Born rule for probabilities to make sense, otherwise there would simply be deterministic evolution according to the Schrodinger equation). So, even if one shows that the Born rule is the only consistent probability measure for the Hilbert spaces of quantum mechanics, it does not come in contact with the physical claims made by the standard axioms. Another approach, in particular, the papers by Carroll and Deutsch (the latter of whom has worked on decision-theoretic approaches) are in the framework of the many-words formulation. There, you can make sense of wavepacket reduction as the reduction of the relative state of a system with respect to an observer without violating underlying unitarity. However, it is conceptually difficult to derive the Born rule there. One reason is that the naive branch-counting leads to a contradiction with the Born rule. And the more sophisticated epistemic approaches have been criticized for either being circular or sloppy.
You can see the critiques of the derivations of the Born rule in papers by Adrian Kent, 1997 and 2014. I would also recommend having a look at this answer to my recent question by
@ChiralAnomaly
for some general comments on the derivations of the Born rule.