Yes. What you wrote is correct. Instead of viewing $S^2\subset\Bbb{R}^3$ as you did, let
$$S^3=\{(\alpha,\beta)\in\Bbb{C}^2: |\alpha|^2+|\beta|^2=1\}$$
$$S^2=\{(x,z)\in\Bbb{R}\times\Bbb{C}: x^2+|z|^2=1\}$$
$$S^1=\{z\in\Bbb{C}: |z|^2=1\}$$
Then $h:S^3\rightarrow S^2$ can be written slightly more compactly as
$$h(\alpha,\beta)=(|\alpha|^2-|\beta|^2,2\alpha\bar{\beta})$$
Now let's come to the fibers $h^{-1}(p)$.
Claim: $\, h(\alpha_1,\beta_1)=h(\alpha_2,\beta_2)\iff (\alpha_1,\beta_1)=(z\alpha_2,z\beta_2) \ \mbox{ for some } z\in S^1$.
Consequence: each fiber $h^{-1}(p)\simeq S^1$ is a circle!
Proof of the Claim: If $z\in S^1$ then $|z|^2=z\bar{z}=1$, so
$$h(z\alpha,z\beta)=(|z|^2|\alpha|^2-|z|^2|\beta|^2,2z\bar{z}\alpha\bar{\beta})=h(\alpha,\beta)$$
Viceversa, suppose $h(\alpha_1,\beta_1)=h(\alpha_2,\beta_2)$. In polar form we can write
$$\alpha_k=r_ke^{i\theta_k}, \ \beta_k=s_ke^{i\phi_k} \quad (k=1,2)$$
Imposing the two conditions $(\alpha_k,\beta_k)\in S^3$ and $h(\alpha_1,\beta_1)=h(\alpha_2,\beta_2)$ we get
$$\begin{cases} r_1^2+s_1^2=r_2^2+s_2^2=1 \\ r_1^2-r_2^2=s_1^2-s_2^2 \end{cases} \Rightarrow \begin{cases} r_1=r_2 \\ s_1=s_2 \end{cases}$$
Moreover, it has to be $\theta_1-\theta_2=\phi_1-\phi_2$, which implies $\phi_1-\theta_1=\phi_2-\theta_2=\vartheta$ for some fixed $\vartheta$. Conclusion: $(\alpha_1,\beta_1)=(z\alpha_2,z\beta_2)$, where $z=e^{i\vartheta}$.
Edit: Some fibers stereographically projected on $\Bbb{R}^3$, as explained in the comments below:
Best Answer
Any differentiable manifold is locally homeomorphic to Euclidean space. In other words, if we select a point on the manifold, then over very small distances the manifold can be approximated by Euclidean space. It is then possible to parameterise the manifold with local polar coordinates $(\rho,\theta)$ which behave like polar coordinates in an infinitesimal region around the selected point.
The models GCNN, ACNN and MonNet each use a differentiable manifold parameterised by local polar coordinates. They have a weighting function, called the patch operator weighting function $w_i(\rho,\theta)$. Table $1$ in the paper gives $w_i(\rho,\theta)$ for ACNN and GCNN.
The red curves are $0.5$ level sets. That is to say, $w_i(\rho,\theta)=0.5$ along the red curves.
Edit: The OP asked about the definition of MoNet
In section 4, the paper mentions using a weighting function of the form $w_j({\bf{u}})=\exp\left(-\frac{1}2(\bf{\mu}-\bf{\mu}_j)^T\bf{\Sigma}_j(\bf{\mu}-\bf{\mu}_j)\right)$ with $\bf{\Sigma}_j$ and $\bf{\mu}_j$ learnable (formula 11 in the paper). $\bf{\Sigma}_j$ is restricted to being a diagonal matrix.
The paper then describes the neural network used to learn $\bf{\Sigma}_j$ and $\bf{\mu}_j$ and the procedure used to train it. The Adam method is explained by the following paper: https://arxiv.org/abs/1412.6980