This is perhaps best illustrated by an example. Consider the ODE with boundary values
$$\varepsilon y''+(1-\varepsilon)y'-y=0,\qquad y(0)=0,\quad y(1)=1.$$
The exact solution is easily found:
$$ y=\frac{e^x-e^{-x/\varepsilon}}{e-e^{-1/\varepsilon}}.$$
Basically, the $e^x$ part will be the outer solution. It has the natural length scale $1$, and is approximately valid in most of the interval $[0,1]$, i.e., where $x\gg\varepsilon$. Not coincidentally, $e^{x-1}$ solves
$$y'-y=0,\qquad y(1)=1$$
which is the original problem with $\varepsilon=0$ and one boundary condition thrown away.
And the reason for the lost boundary condition is that it is inside the inner region, defined by $x\lesssim\varepsilon$. Inside the inner region the problem has a much shorter length scale, and in fact the behaviour solution is dominated by the $e^{-x/\varepsilon}$ part (but one must not forget the $e^x$ part – just notice that $e^x\approx1$ in the inner region, so it is quite “tame”).
Typically, one finds the inner solution by rescaling the problem to fit the natural length scale of the inner region; in this case, $\varepsilon$. So use $X=x/\varepsilon$ as the new length scale. Put $Y(X)=y(x)$ and rewrite the original problem to
$$Y''+(1-\varepsilon)Y'-\varepsilon Y=0,\qquad Y(0)=0,$$
where I have thrown away the other boundary condition, which would be $Y(1/\varepsilon)=1$. That is too far outside the inner region to be useful.
To a zeroth approximation this becomes
$$Y''+Y'=0,\qquad Y(0)=0,$$
with a general solution $Y(X)=C(1-e^{-X})$.
The remaining problem is to match up inner and outer solutions in order to determine $C$.
This may done by appealing to $x$ in the intermediate region, where $x\gg\varepsilon$ but still $x\ll1$, where $X\gg1$, so the inner solution will be ${}\approx C$ while the outer solution is ${}\approx e^{-1}$, thus leading to $C=e^{-1}$.
So for this example, in summary: The inner region is $x\lesssim\varepsilon$, the inner solution is $e^{-1}(1-e^{-x/\varepsilon})$, the outer region is $x\gg\varepsilon$, and the outer solution is $e^{x-1}$. All this just to lowest order, of course.
1.) The equation remains conservative for $ϵ\ne 0$ so that all solutions always lie on level surfaces of
$$
E=\frac12y'^2+\frac12y^2+\fracϵ4y^4.
$$
All solutions that do not lie on the level of singular points can not help but be periodic. For $ϵ>0$ there are no singular points except the origin. Any solution for $ϵ\ll 1$ will be close to a circle and at time $2\pi$ will be close to the initial point.
2.) The frequency-one-terms on the right side are in resonance to the homogeneous solution of the left side, and thus will give solution terms that are increasing in amplitude. This gives obviously only a valid approximation for a very local solution. With the perturbation approach one has introduced flexibility to shift these terms of the equation into the base frequency, or said otherwise, to compensate them with changes in the frequency. Thus by selecting $χ_0=1$ and
$$
2χ_1a+\frac{3a^3}4=0
$$
there is no longer resonance in this level of the perturbation equations, the resulting periodic solution is correct to a higher order in $ϵ$.
Essentially, if you solve the equation using a simple perturbation approach $y=y_0+ϵy_1+...$, you get the first perturbation equation as
$$
y_1''+y_1=-y_0^3=-c^3\cos^3(t+ϕ)=-\frac{c^3}{4}[\cos(3t+3ϕ)+3\cos(t+ϕ)]
$$
with particular solution of the form
$$
y_1=A\cos(3t+3ϕ)+Bt\sin(t+ϕ)
\\
\implies
y_1''+y_1=-8A\cos(3t+3ϕ)+2B\cos(t+ϕ)
$$
so that the coefficients are $A=\frac{c^3}{32}$ and $B=-\frac{3c^3}{8}$. In the expression for $y$ the combination of terms
$$
c\cos(t+ϕ)-ϵ\frac{3c^3}{8}t\sin(t+ϕ)
$$
can be interpreted as linear Taylor polynomial for
$$
c\cos\left(\left(1+ϵ\frac{3c^2}{8}\right)t+ϕ\right)
$$
These are equivalent approximations for $t\approx 0$, but the second one gives a result that is also globally valid for a periodic solution.
Best Answer
Mathieu's equations is a particularly tricky example since the value of the parameter $\delta$ is very important. I recommend starting off with some simpler equations requiring the method of multiple scales.
We are looking for good approximations to $x(t)$ in the limit $\epsilon\to0$. It turns out that there are a range of related techniques to find asymptotic approximations to $x(t)$. Sometimes these give power series expansions, but often they do not (they often do not converge, for example, and include dependence on $\epsilon$ other than just positive powers). The simplest case is a simple power series, which works (surprisingly) often.
Yes it is fairly standard and can be figured out. There are two parts to this expansion, the sequence of gauge functions in the expansion (1, $\epsilon$, $\epsilon^2$, etc.) which is the simplest expansion we use, and also the multiple time-scales, $\xi=t$ and $\eta=\epsilon t$. This is known as the method of multiple scales. For motivation, look at any perturbation theory textbook, there is usually a worked example showing why an expansion like $x_0(t)+\epsilon x_1(t)+\ldots$ is insufficient in some cases and that something like $x_0(\xi,\eta)+\epsilon x_1(\xi,\eta)+\ldots$ is necessary.
We don't usually introduce the perturbation factor, it's part of the problem. For example, an equation describes the motion of some object might have a relatively small amount of drag, and the scaling of the drag becomes the perturbation factor, i.e. $$ \frac{\mathrm d^2x}{\mathrm dt^2}=-1-\epsilon|x|^2$$ describes an object being thrown upwards which is affected by gravity and a small amount of air resistance (scaled by $\epsilon$). It's naturally a small effect and we expect a small change from the solution of $$ \frac{\mathrm d^2x}{\mathrm dt^2}=-1, $$ at least while $\epsilon$ is small.