You may be thinking of the cumulative distribution function, which takes on all values in the interval $(0,1)$. Or else you may be thinking of the (probability) density function
$$\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}},$$
the familiar "bell-shaped" curve.
This density function is positive, but not necessarily between $0$ and $1$. It reaches a maximum when $x=\mu$. The maximum value (what your post would call the maximum $y$-value) is $\dfrac{1}{\sqrt{2\pi}\sigma}$. The range of the density function is the interval $\left(0,\frac{1}{\sqrt{2\pi}\sigma}\right]$.
In particular, when $\sigma$ is small, the maximum value can be quite large: the density function reaches a sharp high peak. If $\sigma$ is large, the density function, though still characteristically bell-shaped, is flat and low. The area under the density curve, and above the $x$-axis, is always $1$. So if the density function is near $0$ very soon (small variance,) it is intuitively clear that the curve must reach quite high.
Remark: Let $f(x)$ be our probability density function. Then for small $h$, the probability that our random variable lies between $x$ and $x+h$ is approximately $hf(x)$. In that sense, you can pick up a pretty good picture of $f(x)$ if you have a largish number of data points.
This is only an answer to your first question.
How can they replace only one entry with q and say that this is entropy of q?
In the paper $h(q)$ is not computed this way. The inequality of Lemma 4.2 is used to prove that $h(p) \le log(n)$ and
$h(p) \lt log(n)$ if $p$ is not the uniform distribution with $p_1=p_2=\ldots p_n=\frac{1}{n}$
Lemma 4.2:
$$-\sum_{i=1}^{n}p_i \log{p_i} \le -\sum_{i=1}^{n}p_i \log{q_i} \tag{1} $$
Equality holds iff $$p_i=q_i, i=1,\ldots , n \tag{2}$$
$\square$
We know that the entropy is defined by
$$h(p)=-\sum_{i=1}^{n}p_i \log{p_i} \tag{3} $$
This can be used to reformulate the inequation of the Lemma as
$$ h(p)\le -\sum_{i=1}^{n}p_i \log{q_i} \tag{4} $$
This is valid for all discrete distributions so also for the uniform distribution with
$$q_i=\frac{1}{n} ,i=1,\ldots,n \tag{4a} $$
Substituting $\frac{1}{n}$ for $q_i$ gives
$$ h(p)\le \sum_{i=1}^{n}p_i \log{n} = (\log{n}) \cdot \sum_{i=1}^{n}p_i = \log{n} \tag{5} $$
But $log{(n)}$ is also $h(q)$, if $q$ is the uniform distribution. This can checked simply by using the definition of the entropy:
$$h(q)=-\sum_{i=1}^{n}q_i \log{q_i}=-\sum_{i=1}^{n}\frac{1}{n} \log{\frac{1}{n}} = \log{n} \sum_{i=1}^{n}\frac{1}{n} = \log{n} \tag{6} $$
So it follows that for the uniform distribution $q$
$$h(p) \le \log{n} = h(q) \tag{7} $$
Because of $(6)$ and $(2)$ equality holds exactly if $p$ is the uniform distribution too.
Edit:
Theorem 5.1 states, that the continous probability density on [a,b] with $\mu = \frac{a+b}{2}$ that maximizes entropy is uniform distribution $q(x)=\frac{1}{b-a}, x \in [a,b]$. This complies with the principle of indifference for coninous variable found here.
On the whole real line there is no uniform probability density. On the whole real line there is also no continous probability density with highest entropy, because there are continous probability densities with arbitrary high entropies, e.g. the gaussian distribution has entropy $\frac{1}{2}(1+\log(2 \pi \sigma^2))$: if we increase $\sigma$ the entropy increases.
Because there is no maximal entropy for continuous densities over $R$ we must have additional constraints, e.g. the constraint that $\sigma$ is fixed and that $\mu$ is fixed. The fact that there is a given finite $\sigma^2$ and $\mu$ for me makes intuitively clear that there values nearer to $\mu$ must have higher probability. If you don't fix $\mu$ then you will get no unique solution.The Gaussian distribution for each real $\mu$ is a solution: this is some kind of "uniformness", all $\mu$ can be used for a solution.
Notice that it is crucial to fix $\sigma$, $\mu$ and to demand $p(x)>0 , \forall x \in R$. If you fix other values or change the form $R$ to another domain for the density funtion , e.g. $R^+$, you will get other solution: the exponential distribution, the truncated exponential distribution, the laplace distribution, the lognorma distribution (Theorems 3.3, 5.1, 5.2, 5.3)
Best Answer
It looks to me like typically (always?) you will need three points to distinguish a specific normal curve. Let's look at the case of one curve being standard normal, and the other being general:
To find intersection points, we need to solve:
$$e^{-\frac{x^2}{2}}=\frac{1}{\sigma}\cdot e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$
This can be rewritten as $\frac{1}{\sigma}=e^{-\frac{x^2}{2}+\frac{(x-\mu)^2}{2\sigma^2}}$
Upon taking logarithms, you will generally get a quadratic in $x$, so two solutions (usually).
This means that most normal curves will intersect twice, thus requiring $3$ points for unique determination.