We have some multivariate data $x$, drawn from a distribution $\mathcal{D}$ with some unknown parameter $\theta$. Note that $x$ are sample outcomes.
We want to test some hypothesis about an unknown parameter $\theta$, the values of $\theta$ under the null hypothesis are in the set $\theta_0$.
In the space of the $X$, we can define a rejection region $R$, and the power of this region $R$ is then defined as $\mathcal{P}_\bar{\theta}^R=P_\bar{\theta}(x \in R)$. So the power is computed for a particular value $\bar{\theta}$ of $\theta$ as the probability that the sample outcome $x$ is in the rejection region $R$ when the value of $\theta$ is $\bar{\theta}$. Obviously the power depends on the region $R$ and on the chosen $\bar{\theta}$.
Definition 1 defines the size of the region $R$ as the supremum of all the values of $\mathcal{P}_\bar{\theta}^R$ for $\bar{\theta}$ in $\theta_0$, so only for values of $\bar{\theta}$ under $H_0$. Obviously this depends on the region, so $\alpha^R=sup_{\bar{\theta} \in \theta_0} \mathcal{P}_\bar{\theta}^R$.
As $\alpha^R$ depends on $R$ we have another value when the region changes, and this is the basis for defining the p-value: change the region, but in such a way that the sample observed value still belongs to the region, for each such region, compute the $\alpha_R$ as defined above and take the infimum: $pv(x)=inf_{R |_{x \in R}} \alpha^R$. So the p-value is the smallest size of all regions that contain $x$.
The theorem is then just a 'translation' of it, namely the case where the regions $R$ are defined using a statistic $T$ and for a value $c$ you define a region $R$ as $R=\{ x | T(x) \ge c \}$. If you use this type of region $R$ in the above reasoning, then the theorem follows.
EDIT because of comments:
@user8: for the theorem; if you define rejection regions as in the theorem, then a rejection region of size $\alpha$ is a set that looks like $R^\alpha= \{X | T(X) \ge c_\alpha \}$ for some $c_\alpha$.
To find the p-value of an observed value $x$, i.e. $pv(x)$ you have to find the smallest region $R$, i.e. the largest value of $c$ such that $\{X | T(X) \ge c \}$ still contains $x$, the latter (the region contains $x$) is equivalent (because of the way the regions are defined) to saying that $c \ge T(x)$, so you have to find the largest $c$ such that $\{X | T(X) \ge c \& c \ge T(x) \}$
Obviously, the largest $c$ such that $ c \ge T(x)$ should be $ c = T(x)$ and then the set supra becomes $\{ X | T(X) \ge c = T(x)\}=\{ X | T(X) \ge T(x)\}$
I will start from the last question and work backwards. I think there might be a typo in the book or in your transcription:
\begin{align}
P_\theta\left(\frac{\bar X-\theta_0}{\sigma /\sqrt n}>c\right)
& = P_\theta\left(\bar X > \theta_0 + c \, \sigma /\sqrt n\right) \\
& = P_\theta\left(\bar X - \theta > \theta_0 - \theta + c \, \sigma /\sqrt n\right) \\
& = P_\theta\left(\frac{\bar X-\theta}{\sigma /\sqrt n} > c +\frac{\theta_0-\theta}{\sigma /\sqrt n}\right) \\
& = P_\theta\left(Z > c +\frac{\theta_0-\theta}{\sigma /\sqrt n}\right) \\
& = 1-\Phi\left(c +\frac{\theta_0-\theta}{\sigma /\sqrt n}\right)
\end{align}
The point is that, you are dealing with a general expression for the probability of rejecting the null hypothesis at any $\theta$ in the parameter space and from the final expression you can observe that this expression, as a function of $\theta$, is an increasing function. When $\theta=\theta_0$ then, we have the $\sup$ of this function over the null parameter space, $\sup = 1-\Phi(c)$. For a specified significance level $\alpha$ then, we will take $c = \Phi^{-1}(1-\alpha)$ guaranteeing the worst Type-I error to be no more than $\alpha$.
Moving on to the first example, note that you are dealing with $\sum{X_i}$ versus $\bar X$ in the second example. Also, Poisson has mean = variance.
You may change the 1st example to be similar to the 2nd by saying: "Reject null if:"
\begin{align}
T & > c \\
\sum{X_i} & > c \\
\bar X & > \frac{c}{n} \\
\frac{\bar X - \lambda_0}{\sqrt{\lambda_0}/ \sqrt n} & > \frac{\frac{c}{n} - \lambda_0}{\sqrt{\lambda_0}/ \sqrt n}
\end{align}
But this point, we should realize that writing an expression like example 2 is not easy because $\lambda_0$ appears in both numerator and denominator.
So it is easier to deal with $\sum{X_i} > c$. So we get
\begin{align}
P_\lambda\left(\sum{X_i} > c \right)
& =
P_\lambda\left( \frac{\sum{X_i} - n \lambda }{\sqrt{ n \lambda}}
> \frac{c - n \lambda }{\sqrt{ n \lambda}}\right) \\
& \overset{\mathrm{CLT}}{\approx} P_\lambda\left(Z
> \frac{c - n \lambda }{\sqrt{ n \lambda}}\right)
\end{align}
In addition, problem 1 gives the values of the power function at two specific parameter values and asks you to solve for two unknowns $c$ and $n$ given those. One could ask a similar question in example 2 also, but there the problem will need to provide $\sigma$ in addition.
Best Answer
The issue is that under the null the density is $2$ for $0<x<0.5$
The cdf is $2x$ on that interval, so $P(X_i\leq c) = 2c$, and $P(Y\leq c) = (2c)^n$ and $P(Y>c)=1-(2c)^n$.