Let $Q$ denote the set of all possible configurations of the system (the configuration manifold). Consider a point $q_0\in Q$. For the sake of conceptual clarity, and to make contact with physics notation, let's work in some local coordinate patch around $q_0$.
Suppose that $q_0$ represents the position of the system under consideration at time $t_0$. At a given time $t$ later, the system will be at some position say $q(t)$ that is determined by the evolution equations (the Euler-Lagrange equations if we are doing Lagrangian mechanics), and the quantity
\begin{align}
q(t) - q(t_0) = q(t) - q_0
\end{align}
would be the displacement of the system after a time $t-t_0$. Suppose, instead we consider some other curve $\gamma(s)$ in the configuration space which starts at the point $s_0$;
\begin{align}
\gamma(s_0) = q_0,
\end{align}
and suppose that we compute the displacement
\begin{align}
\gamma(s) - \gamma(s_0) = \gamma(s) - q_0
\end{align}
that would result from moving along this other curve of our choosing. We call this displacement the virtual displacement after a "time" $s-s_0$ corresponding to moving along the curve $\gamma$. It's called virtual because it is the displacement in the position of the system that would occur if the system were to move along the curve $\gamma$ of our choosing -- a "virtual" curve as opposed to the "real" curve along which the system travels according to the Lagrangian evolution of the system.
Note. As Qmechanic suggested in the comments, I used the parameter $s$ for the virtual curve $\gamma$ instead of $t$ to emphasize that moving along that curve does not correspond to time-evolution, but rather any curve of our choosing.
Now what about virtual "infinitesimal" displacements? Well, recall that the term "infinitesimal" in physics essentially always refers to "first order" approximations, see, e.g. this SE post:
Rigorous underpinnings of infinitesimals in physics
So when we are discussing a virtual infinitesimal displacement, what we have in mind is taking the virtual displacement $\gamma(s) - q_0$, Taylor expanding it to first order in $s$, and extracting only the first order term. Let's do this:
\begin{align}
\gamma(s) - q_0 = \gamma(s_0) + \dot\gamma(s_0) (s-s_0) + O((s-s_0)^2) - q_0
\end{align}
Using the fact that $\gamma(s_0) = q_0$, we see that the Taylor expansion of the virtual displacement is
\begin{align}
\gamma(s) - q_0 = \dot\gamma (s_0) (s-s_0) + O((s-s_0)^2),
\end{align}
and now we notice that to first order in $s$, the size of the virtual displacement is controlled by the coefficient of $s-s_0$, namely $\dot\gamma(s_0)$. In other words, virtual infinitesimal displacements (meaning we just keep the first order contribution in $s-s_0$), are determined by the velocity vector of the chosen "virtual curve" at $s_0$. But if you've taken a differential geometry course, then you know that velocities of curves on a manifold are simply tangent vectors to that manifold!
So virtual infinitesimal displacements can be associated with tangent vectors to the configuration manifold. The intuition to keep in mind here as that a virtual displacement just tells us how far we would get away from a certain point on the manifold if we were to travel on a certain curve of our choosing that may not coincide with the actual motion of the system determined by time evolution. The "infinitesimal" part and identifying this part with tangent vectors comes simply from considering what happens only to first order.
This is should be clear given the definition of a virtual displacement. Let us see that. I borrow this from Lectures on analytical mechanics by F Gantmacher.
Consider a system of particles with position vectors we denoted as $\vec{r_i}$. The system may be subjected to a constraint $f(\vec{r_i},\dot{\vec{r_i}},t)=0$. Let us consider a sub class of constraints $f(\vec{r_i},t)=0$. These constraints are said to be holonomic constraints. Now by differentiating this once, we obtain $$\sum_i\frac{\partial f}{\partial\vec{r_i}}\cdot\vec{v_i}+\frac{\partial f}{\partial t}=0$$
This equation is satisfied by the velocities of the particles. The velocities that obey this equation are said to be allowed velocities. Let us define allowed displacements by $\mathrm d\vec{r_i}=\vec{v_i}~\mathrm dt$. This satisfies, $$\sum_i\frac{\partial f}{\partial\vec{r_i}}\cdot{\mathrm d\vec{r_i}}+\frac{\partial f}{\partial t}~\mathrm dt=0$$We can now define virtual displacements as the difference between two allowed displacements. i.e. $\delta\vec{r_i}=(\vec{v}^\prime_i-\vec{v}_i)~\mathrm dt$. Therefore this satisfies, $$\sum_i\frac{\partial f}{\partial\vec{r_i}}\cdot\delta{\vec{r_i}}=0$$Let us do an example. Consider a simple pendulum with the string length $l$ attached to an oscillating support. Let the coordinates of the bob be $(x,y)$ and let the coordinates of the point of support be given by $(L \cos(\Omega t),0)$. Then the constraint here is $f(x,y,t)=(x-L\cos(\Omega t))^2+y^2-l^2=0$. Then the allowed displacements satisfy $2(x-L\cos(\Omega t)~\mathrm dx+2y~\mathrm dy+2(x-L\cos(\Omega t))L\Omega\sin(\Omega t)~\mathrm dt=0$ whereas the virtual displacements satisfy $2(x-L\cos(\Omega t))~\delta x+2y~\delta y=0$.
Best Answer
Here on SE, you may already find many answers to your question. Even if most of them are correct, I feel that a plain and correct answer is still missing. Where plain does not mean non-rigorous. But mathematical rigor is not the same as introducing differential manifolds. One could keep making confusing arguments even after introducing fiber bundles.
If there would be no constraint on a mechanical system described in the 3D space by a set of $N$ position vectors ${\bf r}_i~~(i=1,\dots,N)$, one could consider any set of $N$ vectors as the set of possible displacements $d {\bf r}_i$. Said in another way, for a set of starting positions ${\bf r}_i$, we have complete freedom in choosing the initial velocities ${\bf v}_i$ which would provide a set of displacements $d{\bf r}_i={\bf v}_i\,dt$ after a time $dt$.
If there are constraints, we are not allowed to choose all velocities (and then displacements) at our will. We must be consistent with the existence of constraints. For example, if a marble has to move on the floor, we are not allowed to assign a velocity pointing downwards: it would be in contradiction with the constraint.
If constraints are static, there is no difference between the set of all possible displacements and virtual displacements. All of them are compatible with constraints. The reason for considering different displacements, and then introducing the definition of virtual displacement as different from possible displaceent or briefly displacement, is when constraints vary with time. In that case, we can consider two different set of displacements: those compatible with the evolving constraint, and those compatible with the constraint frozen at a given time $t$. A simple example helps to clarify the difference.
Let's consider a point-like particle constrained to move on a ramp. If the ramp is fixed, all possible displacements (virtual or not) are displacements on the plane of the ramp. they can be obtained by assigning a velocity parallel to the ramp ad after a time $dt$ the particle will be again on the ramp.
If the ramp is moving, an initial velocity parallel to the instantaneous position of the ramp would result, in general, in a violation of the evolving constraint. The real possible velocities, which ensure the particle remains on the ramp after a time $dt$, are the sum of velocities parallel to the ramp plus the instantaneous velocity of the ramp at the initial time. The resulting possible displacements are the real displacements since they correspond to the set of all the possible displacements which could be physically obtained. Still, in some cases, one may be interested in the situation of what would be the displacement under the hypothesis of freezing the evolving constraint. In the case of the ramp, virtual displacements are those parallel to the ramp plane. More formally, if ${\bf u}$ is the velocity of the ramp at some time $t$ and ${\bf v}$ is a velocity parallel to the ramp plane $$ d{\bf r} = ({\bf v + u }) dt $$ is a possible displacement after a time $dt$ compatible with the evolving constraint, while $$ \delta{\bf r} = {\bf v } dt $$ is a virtual displacement (${\bf v } dt$ is a displacement compatible with the frozen constraints).
The real reason for introducing virtual displacements is connected to the possibility of deriving useful information on the system by analyzing the work done by constraint forces. However calculating work in the case of constraints using the actual displacements would provide useless information. In the case of a mass $m$ rigidly fixed on a moving ramp, if we would use the actual displacement of the mass we would find a non-zero value. Only virtual displacement, which are parallel to the plane of the ramp, could give us zero value for work, which is what we need, for a useful exploiting of the concept of work in such a situation.
Edited after change of the title:
I have to say that I do not agree with the suggestion of removing reference to virtual work from the title. Without the need of the virtual work as intrinsic work performed by constraints, it is hard to understand the reason for introducing virtual displacements at all. Unfortunately there is a long tradition of neglecting such a key point in the conceptual development of the subject. The results is a generalized sense of lack of meaning, sometimes hidden under tons of math.
End of editing
At this point, one can generalize all these ideas, looking at the introduction of constraints as equivalent to the introduction of a non trivial configuration space, well described as a differential manifold. The set of all tangent planes at each point of the manifold has a nice mathematical structure and so on. But this is just a convenient and general dress for the physical ideas above. It becomes a trivial step only if one has got the right concept of what a virtual displacement is and why it has been introduced.