Indeed, the fact that this is a Fourier transform is by and large a mathematical coincidence; the intuition comes not from interpreting it as a Fourier transform, but by considering it from another angle, that of moment generating functions.
Throughout this answer, I assume all random variables are real-valued; it seems like that's what you're concerned about anyway.
If you have done some statistics, you are almost certainly familiar with the concept of the moment generating function of $X$,
$$
M_X : \mathbb R \to \mathbb R \\
M_X(t) = \mathbb E\big[e^{tX}\big].
$$
This function has many nice properties. For instance, the $n$-th moment of $X$, $\mathbb E\big[X^n\big]$, can be found by computing $M_X^{(n)}(0)$, the $n$-th derivative of $M_X$ evaluated at $0$. Another important application is the fact that two random variables with the same moment generating function have the same distribution; that is to say, the process of determining a moment generating function is "invertible". A third and also significant application is the fact that, for any two independent random variables $X$ and $Y$, we have
\begin{align*}
M_{X+Y}(t) &= \mathbb E \big[e^{t(X+Y)}\big] \\
&= \mathbb E \big[e^{tX} e^{tY}\big] \\
&= \mathbb E \big[e^{tX} \big] \mathbb E \big[e^{tY} \big] \\
&= M_X(t)M_Y(t).
\end{align*}
(In a somewhat informal sense the third equality follows by considering $e^{tX}$ and $e^{tY}$ as independent random variables.) In conjunction with the fact that moment generating functions are invertible, this essentially permits us to derive a formula for the distribution of the sum of two independent random variables; hopefully, this application also makes clear why there is a seemingly arbitrary exponential in the definition of the moment generating function.
Now, the classical example of an application of moment generating functions is in the proof of the Central Limit Theorem. They are a natural candidate, because CLT involves the sums of independent random variables, and moment generating functions are well-equipped to deal with such matters. However, there is a glaring issue with their use: moment generating functions do not always exist. In particular, a random variable with infinite mean will not have a convergent moment generating function for any $t$ other than $0$.
This is where characteristic functions come in. As you know, we define a characteristic function by
$$
\varphi_X : \mathbb R \to \mathbb C \\
\varphi_X(t) = \mathbb E \big[ e^{itX} \big].
$$
All of the nice properties that applied for moment generating functions mentioned above still apply for characteristic functions. In particular:
the $n$-th moment of $X$ can be found as $(-i)^{(n)} \varphi_X^{(n)}(0)$, if it exists
two random variables with the same characteristic function have the same distribution
$\varphi_{X+Y}(t) = \varphi_X(t)\varphi_Y(t)$ for independent r.v.s $X$, $Y$ (this is proven essentially the same way as before).
The critical difference with moment generating functions is this: characteristic functions always exist, at least of real-valued random variables. The intuitive reason that characteristic functions will always exist is that the possible values taken by $e^{itX}$ all lie on the unit circle, hence are bounded, and so intuitively the integral defining the expected value will take a finite value somewhere within the unit circle. Going back to the CLT example, this then allows us to complete our proof without issue; indeed, if you are interested, the proof on the Wikipedia page uses characteristic functions.
Based on this little narrative, it is pretty clear that the entire motivation for the introduction of $i$ in the exponent of the characteristic function is the fact that convergence will be guaranteed for a real-valued random variable. It is not much more than a nice mathematical coincidence that the characteristic function coincides with the Fourier transform, and it makes little sense (at least in my opinion) to try and carry over intuitions from the Fourier transform to the characteristic function; instead, the intuition can be seen by thinking about how this function might have been discovered in the first place.
1. BOTTOM LINE UP FRONT: Treat it as a distribution.
Since the signum function is not integrable on $\mathbb{R}$, it may be useful to view it as a tempered distribution.
Such "generalized functions" are bounded linear functionals on a class of very-well-behaved functions called Schwartz functions. One of Laurent Schwartz's achievements was finding a collection $\mathcal{S}$ of functions on $\mathbb{R}^n$ such that the set of Fourier transforms of these functions is $\mathcal{S}$ itself. That put the original functions and their Fourier transforms on equal footing.
2. Fourier transform of a distribution
Why is this useful? It means that every tempered distribution has a Fourier transform that is also a tempered distribution. It also provides some useful notation to derive expressions and properties of the Fourier transform of a known tempered distribution.
Given any distribution
$\mathsf{T}$ we write the result of applying it to a Schwartz function
$\varphi$ as
$\left<\mathsf{T},\varphi\right>$, but it is to be understood that this is not an inner product of two objects of the same kind. The Fourier transform of the distribution
$\mathsf{T}$ is the distribution
$\widehat{\mathsf{T}}$ for which
\begin{equation}
\left<\widehat{\mathsf{T}},\varphi\right> = \left<\mathsf{T},\widehat{\varphi}\right>
\end{equation}
for every
$\varphi\in\mathcal{S}$, where
$\widehat{\varphi}$ is the Fourier transform of
$\varphi$. Since
$\varphi\in\mathcal{S}$,
$\widehat{\varphi}\in\mathcal{S}$, too.
3. Fourier transform of signum
How is this related to the signum function? If
$\mathsf{T}$ is the signum function viewed as a distribution, then
\begin{equation}
\left<\mathsf{T},\varphi\right> = \int\textrm{sgn}(x)\varphi(x)dx.
\end{equation}
The Fourier transform of this distribution satisfies (or is defined by)
\begin{equation}
\begin{split}
\left<\widehat{\mathsf{T}},\varphi\right> &=~
\left<\mathsf{T},\widehat{\varphi}\right>\\
&=~ \int\textrm{sgn}(x)\widehat{\varphi}(x)dx\\
&=~ -\int_{-\infty}^{0}\widehat{\varphi}(x)dx + \int_{0}^{\infty}\widehat{\varphi}(x)dx.
\end{split}
\end{equation}
4. Changing order of integration
Let's consider the integral for positive reals. The very good behavior of
$\varphi$ allows changing orders of integration in many, many situations.
\begin{equation}
\begin{split}
\int_{0}^{\infty}\widehat{\varphi}(x)dx
&=~
\int_{0}^{\infty}\left[\int\varphi(k)e^{-ixk}dk\right]dx\\
&=~
\lim_{R\to\infty}\int_{0}^{R}\left[\int\varphi(k)e^{-ixk}dk\right]dx\\
&=~
\lim_{R\to\infty}\int\left[\int_{0}^{R}e^{-ixk}dx\right]\varphi(k)dk
\end{split}
\end{equation}
We do something very similar for the negative reals.
\begin{equation}
\begin{split}
-\int_{-\infty}^{0}\widehat{\varphi}(x)dx
&=~
-\int_{-\infty}^{0}\left[\int\varphi(k)e^{-ixk}dk\right]dx\\
&=~\lim_{R\to\infty}-\int_{-R}^{0}\left[\int\varphi(k)e^{-ixk}dk\right]dx\\
&=~
\lim_{R\to\infty}-\int\left[\int_{-R}^{0}e^{-ixk}dx\right]\varphi(k)dk
\end{split}
\end{equation}
We now address the sum of the $R$-dependent integrals.
\begin{equation}
\begin{split}
\int_{0}^{R}e^{-ikx}dx - \int_{-R}^{0}e^{-ikx}dx
&=~
\left.\frac{e^{-ikx}}{-ik}\right|_{x=0}^{x=R}
-
\left.\frac{e^{-ikx}}{-ik}\right|_{x=-R}^{x=0}\\
&=~
\frac{1 - e^{-ikR}}{-ik}
-
\frac{e^{ikR} - 1}{-ik}\\
&=~
\frac{e^{ikR} + e^{-ikR}}{ik}
-
\frac{2}{ik}
\end{split}
\end{equation}
5. Singularity at $k = 0$; Riemann-Lebesgue Lemma
The
$k$ in the denominator will be a problem at
$k=0$. But we know that the original integrals coverge. We must consider the new one as the limit of integrals from
$\epsilon$ to
$\infty$ and from
$-\infty$ to
$-\epsilon$.
\begin{equation}
\int_{|k|>\epsilon}\frac{e^{ikR} + e^{-ikR}}{ik}\varphi(k)dk
=
\int 1_{\{k:|k|>\epsilon\}}(k)\frac{\varphi(k)}{ik}\left(e^{ikR} + e^{-ikR}\right)dk
\end{equation}
For each $\epsilon >0$, the function $1_{\{k:|k|>\epsilon\}}(k)\frac{\varphi(k)}{ik}$ is in $L^1(\mathbb{R})$, so this integral is that function's Fourier transform evaluated at $\omega = R$ plus the same Fourier transform evaluated at $\omega = -R$. The Riemann-Lebesgue Lemma shows that if $f\in L^1(\mathbb{R})$, then $\lim_{|R|\to\infty}\widehat{f}(R) = 0$. Hence, these $R$-dependent terms vanish as $R\to\infty$.
It is worth noting that this shows that we must take the $R$-limit first and then take the $\epsilon$-limit. The opposite order would not work.
6. Cauchy Principal Value
We are left with
\begin{equation}
\lim_{\epsilon\to 0}2i\int_{|k|>\epsilon}\frac{\varphi(k)}{k}dk.
\end{equation}
This is the
Cauchy Principal Value of this integral. This shows that we must interpret the Fourier transform of the signum function very carefully, but we
can do it in the sene of distributions: if
$\textrm{sgn}(x)$ is the signum of
$x$, then
\begin{equation}
\widehat{\textrm{sgn}}(k) = 2i~\mathsf{PV}\left(\frac{1}{k}\right).
\end{equation}
Best Answer
The Fourier Series
A Fourier series (countable expansion of sines and cosines) is only defined for periodic functions because the sines and cosines in the series have frequencies which are assumed to be harmonics (integral multiples) of the original function's frequency. Any countable sum of sines and cosines is a periodic function so long as their periods have an LCM (e.g. see Sum of two periodic functions is periodic?), which is certainly the case when all the functions are harmonics of some fundamental (all the quotients of periods are rational, so there must be an LCM), so the Fourier series, when it exists, must converge to a periodic function.
The Fourier Transform
A Fourier transform exists for any real function $f(x)$ for which the integral $$ \int_{-\infty}^{\infty}f(x)e^{i\omega x}dx $$ exists (the same as $E[e^{i\omega x}]$). It doesn't need to be periodic because the definition says so.
But there's also some intuition here. You can informally think of the Fourier series of a function as the coefficients of its representation in the basis $\Omega = \{e^{i\omega x} \ | \ \omega \in [0,\infty)\}$ (I say informally because there are many subtle issues with uncountable bases--for instance, the dimension of the space, if defined as the number of elements in the basis, is ambiguous, and there is even the possibility of having two bases, one properly containing the other, which span the same space). If we were to write it this way, we would need some function $F(\cdot)$ so that $F(\omega)$ would return the coefficient of $e^{i\omega t}$ for any $\omega \in [0,\infty)$. In other words, we would have
$$ f(x) \ ``=" \sum_{\omega\in [0,\infty)} F(\omega)e^{i\omega x}, $$
which is for all practical purposes the same as
$$ f(x) = \int_{0}^\infty F(\omega) e^{i\omega x}d\omega. $$
Notice that we are admitting all real frequencies, including irrational frequencies, so integrals over $\Omega$ like the one above don't necessarily result in a periodic function. That is, neither $F(\omega)$ nor $f(x)$ must be (nor in general will they be) periodic. What we now have is technically known as a Fredholm integral equation of $F(\omega)$, and one which has the solution
$$ F(\omega) = \int_{-\infty}^{\infty}f(x)e^{-i\omega x}dx, $$
which is the same as the Fourier transform.
The Characteristic Function: Or, Why Probability Theorists Care
The characteristic function is therefore just the Fourier transform of the PDF. Okay. But why is it important to know the Fourier transform of the PDF?
The answer is that, if the PDF of the random variable $X$ is $f$, then derivatives of $E[e^{i\omega X}]$ are proportional to the moments of $f(x)$ through known factors. Since $e^a = 1+ a + \frac{a^2}{2!} + \cdots$ converges for all $a$, then it might be the case that the convergence is uniform, e.g. that $$ \int_{-\infty}^{\infty}f(x)e^{-i\omega x}dx = E[e^{-i\omega X}] = E\left[1- i\omega X + \frac{\omega^2}{2!}X^2-\cdots \right] = E[1] - i\omega E[X] + \frac{\omega^2}{2!}E[X^2]-\cdots $$ In fact this is true. So you then have
$$ i\frac{dE[e^{i\omega X}] }{d\omega}|_{\omega = 0}= E[X],\qquad \frac{d^2E[e^{i\omega X}]}{d\omega^2}|_{\omega = 0} = E[X^2] $$ and so on. Knowing the characteristic function means you get to take derivatives (easy) rather than integrals (hard) to get moments. This correspondence also gives you a way of approximating the PDF from the first $n$ moments of the distribution. Just figure out the proportionality factors up to $n$, add them together with the moments, and take the inverse Fourier transform.