In the notation
$$\left\|x\right\|_\color{blue}{2}^\color{red}{2}$$
- the top (red) $\color{red}{2}$ simply means squaring, as in $x^2$;
- the bottom (blue) $\color{blue}{2}$ refers to the fact that it's the "2-norm", the standard Euclidean norm.
See here for the more general $p$-norm, of which this is a special case:
$$\left\| x \right\| _p = \left( |x_1|^p + |x_2|^p + \dotsb + |x_n|^p \right) ^{1/p}$$
Confidence intervals are great for illustrating the difference between epistemic and aleatory uncertainty.
Before you collect your sample, the probability statement is an aleatory statement -- that is, it pertains to the actual, repeated sampling (frequentist) probability that the (still TBD and random) interval will contain the true value of the parameter.
After you collect the sample and form your interval, you no longer have any aleatory uncertainty (we have our sample now and our interval -- all random values are now known). The resulting interval either contains the true parameter or it does not, we just don't know which one is true. So in what sense should we care about this actual interval?
This is where epistemic uncertainty comes in. We know the aleatory/objective probability of the interval containing the true parameter is either 0 or 1. But we don't know which one! Therefore, the uncertainty is no longer in the values themselves, but our knowledge. Given this, the post-sampling "confidence" is an epistemic statement (whereas pre-sample it was an actual probability statement).
So, for a 95% CI, we know that 95% of intervals formed this way will contain the true parameter; therefore, we should lean toward believing this interval contains the true parameter, accepting the fact that 5% of such intervals will actually not contain it (i.e., be misleading).
Bottom line: pre-sampling, confidence is a true/aleatory probability. Post-sampling it cannot be interpreted as a frequentist probability, but it is valid to use Confidence as a measure of how strongly you should believe the interval is accurate.
Best Answer
I do not know the details of the adversarial networks however I can offer a general answer for probability theory which might be close to the answer.
In a measure-theoretic setting $P(A||\mathscr{G})$ is sometimes written to denote the conditional probability of the event $A$ with respect to the $\sigma$-field $\mathscr{G}$ where $P$ is a probability measure on the measurable space $(\Omega,\mathscr{F})$ where $\mathscr{F}$ is a larger $\sigma$-field satisfying $\mathscr{G}\subseteq\mathscr{F}$. Random variables $Y$ and $X$ can generate such a $\sigma$-fields, say $X$ generates $\mathscr{G}$ and $Y$ generates $\mathscr{F}$, then $P(A||\mathscr{G})=P(Y\in A||X)$. The specific relationship satisfied is
$$\int_{G}P(Y\in A||X)dP=P(\{Y\in A\}\cap\{X\in G\}) \hspace{10pt}\text{for all}\hspace{10pt} G\in\mathscr{G}\hspace{10pt}(1)$$
The $\hat{p}(y)$ in your equation probably (I am guessing here) denotes an estimate using a sample of random data $Y$ observed at $Y=y$. This estimate $\hat{p}(y)$ will be a random variable so perhaps all of the above will apply and the $||$ notation simply hints at the measure-theoretic machinery I allude to.
In the special case where
$$\int_{G}P(Y\in A||X)dP=P(Y\in A||X)\int_{G}dP\\=P(Y\in A||X)P(X\in G)$$
then the above equation reduces to
$$P(Y\in A||X)dP=P(\{Y\in A\}\cap\{X\in G\})/P(X\in G)\\=P([\{Y\in A\}\cap\{X\in G\}]|X\in G)$$
using the traditional $|$ notation signifying the $P(A,B)/P(B)=P(A|B)$ definition. In general the two definitions do not coincide - I believe $\mathscr{G}$ being generated by a countable class $\mathcal{A}$ might be a sufficient condition, that is $\mathscr{G}=\sigma(\mathcal{A})$ where $|\mathcal{A}|=\aleph_{0}$.