It’s much easier to simultaneously construct $X_i$ and $Y_i$ having the desired properties,
by first letting $Y_i$ be i.i.d. Uniform$[0,1]$ and then taking $X_i = F^{-1}(Y_i)$. This is the basic method for generating random variables with arbitrary distributions.
The other direction, where you are first given $X_i$ and then asked to construct $Y_i$, is more difficult, but is still possible for all distributions. You just have to be careful with how you define $Y_i$.
Attempting to define $Y_i$ as $Y_i = F(X_i)$ fails to produce uniformly distributed $Y_i$ when $F$ has jump discontinuities. You have to spread the point masses in the distribution of $X_i$ across the the gaps created by the jumps.
Let $$D = \{x : F(x) \neq \lim_{z \to x^-} F(z)\}$$ denote the set of jump discontinuities of $F$. ($\lim_{z\to x^-}$ denotes the limit from the left. All distributions functions are right continuous, so the main issue is left discontinuities.)
Let $U_i$ be i.i.d. Uniform$[0,1]$ random variables, and define
$$Y_i =
\begin{cases}
F(X_i), & \text{if }X_i \notin D \\
U_i F(X_i) + (1-U_i) \lim_{z \to X_i^-} F(z), & \text{otherwise.}
\end{cases}
$$
The second part of the definition fills in the gaps uniformly.
The quantile function $F^{-1}$ is not a genuine inverse when $F$ is not 1-to-1. Note that if $X_i \in D$ then $F^{-1}(Y_i) = X_i$, because the pre-image of the gap is the corresponding point of discontinuity. For the continuous parts where $X_i \notin D$, the flat sections of $F$ correspond to intervals where $X_i$ has 0 probability so they don’t really matter when considering $F^{-1}(Y_i)$.
The second part of your question follows from similar reasoning after the first part which asserts that $X_i = F^{-1}(Y_i)$ with probability 1. The empirical CDFs are defined as
$$G_n(y) = \frac{1}{n} \sum_{i=1}^n 1_{\{Y_i \leq y\}}$$
$$F_n(x) = \frac{1}{n} \sum_{i=1}^n 1_{\{X_i \leq x\}}$$
so
$$
\begin{align}
G_n(F(x))
&= \frac{1}{n} \sum_{i=1}^n 1_{\{Y_i \leq F(x) \}}
= \frac{1}{n} \sum_{i=1}^n 1_{\{F^{-1}(Y_i) \leq x \}}
= \frac{1}{n} \sum_{i=1}^n 1_{\{X_i \leq x \}}
= F_n(x)
\end{align}
$$
with probability 1.
It should be easy to convince yourself that $Y_i$ has Uniform$[0,1]$ distribution by looking at pictures. Doing so rigorously is tedious, but can be done. We have to verify that $P(Y_i \leq u) = u$ for all $u \in (0,1)$. Fix such $u$ and let $x^* = \inf\{x : F(x) \geq u \}$ — this is just the value of quantile function at $u$. It’s defined this way to deal with flat sections. We’ll consider two separate cases.
First suppose that $F(x^*) = u$. Then
$$
Y_i \leq u
\iff Y_i \leq F(x^*)
\iff F(X_i) \leq F(x^*).
$$
Since $F$ is a non-decreasing function and $F(x^*) = u$,
$$
F(X_i) \leq F(x^*) \iff X_i \leq x^* .
$$
Thus,
$$
P[Y_i \leq u]
= P[X_i \leq x^*]
= F(x^*)
= u .
$$
Now suppose that $F(x^*) \neq u$. Then necessarily $F(x^*) > u$, and $u$ falls inside one of the gaps. Moreover, $x^* \in D$, because otherwise $F(x^*) = u$ and we have a contradiction.
Let $u^* = F(x^*)$ be the upper part of the gap. Then by the previous case,
$$
\begin{align}
P[Y_i \leq u]
&= P[Y_i \leq u^*] - P[u < Y_i \leq u^*]\\
&= u^* - P[u < Y_i \leq u^*].
\end{align}
$$
By the way $Y_i$ is defined, $P(Y_i = u^*) = 0$ and
$$
\begin{align}
P[u < Y_i \leq u^*]
&= P[u < Y_i < u^*] \\
&= P[u < Y_i < u^* , X_i = x^*] \\
&= u^* - u .
\end{align}
$$
Thus, $P[Y_i \leq u] = u$.
If you have a cumulative distribution function $F$, then calculating the $p$-value for given statistic $T$ is simply $1-F(T)$. This is straightforward in R. If you have probability density function on the other hand, then $F(x)=\int_{-\infty}^xp(t)dt$. You can find this integral analytically or numerically. In R this will look like this:
dF <- function(x)dnorm(x)
pF <- function(q)integrate(dF,-Inf,q)$value
> pF(1)
[1] 0.8413448
> pnorm(1)
[1] 0.8413447
You can tune integrate
for better accuracy. This of course may fail for specific cases, when the integral does not behave well, but it should work for majority of the density functions.
You can of course pass parameters into pF
, if you have several parameter values to try-out and do not want to redefine dF
each time.
dF <- function(x,mean=0,sd=1)dnorm(x,mean=mean,sd=sd)
pF <- function(q,mean=0,sd=1)integrate(dF,-Inf,q,mean=mean,sd=sd)$value
> pF(1,1,1)
[1] 0.5
> pnorm(1,1,1)
[1] 0.5
Of course you can also use Monte-Carlo methods as detailed by @suncoolsu, this would be just another numerical method for integration.
Best Answer
As pointed out in the comment, the question makes more sense if we consider converting random variables. As for your statement about transforming into normal distribution, I hope you're referring to something like Box-Cox (which is an approximate method), not feature normalization with $(x-\mu)/\sigma$, since it's not transforming to normal distribution.
A standard way for converting a RV from some distribution into another is using Inverse CDF method. Typically, many random number generators use this method to convert the uniform distribution into an arbitrary one. Sometimes, this might not be enough since we can't get analytical inverse of $F(x)$, as in normal RV, and other methods exist, e.g. Box-Muller for uniform(s) to normal(s) conversion. When the transform accepts going in reverse direction, we can first convert $X_1$ to $U$, then $U$ to $X_2$.