Because $F_X$ need not be strictly increasing, one must use a sort of generalized inverse. Preferences vary with authors, but here is one example (the "right-continuous inverse"):
$$
F^{-1,+}_X(t):=\inf\{x: F_X(x)>t\}, \quad t\in[0,1],
$$
wherein $\inf\emptyset$ is understood to equal $1$.
For example, if $X$ has cdf
$$
F_X(x)=\cases{0,&$x<1$;\cr {1\over 3},&$1\le x<2$;\cr 1,&$x\ge 2$,\cr}
$$
then
$$
F^{-1,+}_X(t) =\cases{1,&$0\le t<{1\over 3}$;\cr 2,&${1\over 3}\le t\le 1$.\cr}
$$
In this case, if $U$ has the uniform distribution on $[0,1]$, then $F^{-1,+}(U)$ takes the value 1 with probability $1/3$ and the value 2 with probability $2/3$, so $F^{-1,+}(U)$ has the same distribution as $X$.
The "left-continuous inverse" $F^{-1,-}_X(t):=\sup\{x: F_X(x)\le t\}$ would work as well.
To acquire a solid comprehension of the technical definition of a random variable requires measure theory, which does require sigma algebras.
I will try to give a relatively non-technical definition. First note that we require a set $\Omega$, called the sample space, which roughly contains everything that could possibly happen in our experiment. The elements $\omega \in \Omega$ are the individual outcomes that can occur.
A random variable is a function from $\Omega$ to $\mathbb R$ that has a special property that helps make rigorous probability theory work. (This special property is called being measurable, which you can look up if you want to.) So given any outcome of the experiment $\omega$, you get $X(\omega)$, which is a real number.
A CDF is a function $F(c) = P\big(\{\omega \in \Omega \colon X(\omega) \leq c \} \big)$, or more informally $P(X \leq c$), that gives the probability that the random variable $X$ is less than or equal to $c$. Specifically, the CDF is a function of $c$. (In formal probability theory, the CDF is the fundamental object from which pdfs and pmfs are derived.)
Aside: note that there is no randomness in the definition of a random variable. It's just a function.
Best Answer
Indeed, the symbol $F_X(X)$ is confusing.
A more proper way of expressing it:
Say we have two random variables $X, X'$, which are independent and have the same distribution, i.e. $F_X = F_{X'}$. We then define $Y = F_X(X')$, namely the function $F_X$ applied to the random variable $X'$.
We may calculate the distribution function of $Y$: \begin{eqnarray} F_Y(y) &=& P(Y \leq y)\\ &=& P(F_X(X') \leq y)\\ &=& P(X' \leq F_X^{-1}(y))\\ &=& F_{X'}(F_X^{-1}(y))\\ &=& F_X(F_X^{-1}(y))\\ &=& y. \end{eqnarray}
You will notice that the random variable $X'$ is kind of a dummy, which isn't very useful. In fact it doesn't really matter whether $X$ and $X'$ are independent or not, as we are only using $X$ to define its distribution function $F_X$.
Thus people tend to just write $F_X(X)$.