Your asymptotic approximation is correct :
$$y (x)\sim C_2 (ax)^{-1/4} \exp \left[-\frac{1}{\epsilon} \sqrt{a} \frac{2}{3} x^{3/2}\right] \tag 1$$
according to the boundary condition $y(\infty)=0$.
Note : A second order ODE requires at least two boundary conditions to expect having a unique solution. Since the wording of the question specifies only the condition $y(\infty)=0$, the solution remains undetermined : The coefficient $C_2$ cannot be computed.
Another key point is that an asymptotic formula is valid only for $x\to\infty$, not for $x\to 0$.So, don't expect Eq.$(1)$ makes sens at $x=0$. It is normal that the asymptotic approximation be far to be correct for small values of $x$.
If you want an approximate around $x=0$, expand $y(x)$ in power series.
$$y(x)\simeq c_0+c_1x+c_2x^2+...$$
Of course, this formula is not valid for large $x$, so the boundary condition at infinity cannot be fulfilled. Moreover, two boundary conditions should be required close to $x=0$ in order to determine the coefficients $c_0$ , $c_1$, $c_2$ , …
In fact, as Claude Leibovici quite rightly wrote in comments, the exact solving involves the Airy functions.
$$y(x)=c_1\text{Ai}\left(\sqrt[3]{\frac{a}{\epsilon^2}}\:x\right)+c_2\text{Bi}\left(\sqrt[3]{\frac{a}{\epsilon^2}}\:x\right)$$
Ai and Bi are the Airy functions http://mathworld.wolfram.com/AiryFunctions.html
The condition $y(\infty)=0$ is satisfied any $c_1$ and $c_2$, which shows that the condition $y(\infty)=0$ is not sufficient to fully determine a solution.
On physicists viewpoint, defining the properties of a boundary layer is certainly a way to introduce some boundary conditions in the range around $x=0$. Supposing that $y(x)\simeq y_0+y'_0 x$ in a layer of a given thickness $\delta$ allows to compute $c_1$ and $c_2$ as a function of $y_0$ , $y'_0$ and $\delta$, thanks to the series expansion of the Airy functions. Then the asymptotic expansion of the Airy functions will leads to the coefficient $C_2$ in the asymptotic approximation $(1)$.
http://functions.wolfram.com/Bessel-TypeFunctions/AiryAi/06/
http://functions.wolfram.com/Bessel-TypeFunctions/AiryBi/06/
I cannot answer all of your questions, partially because I do not understand some of your questions (such as the "What does this mean exactly?" question, as this differential equation here is different from the one in the link). I do not understand why you say "Here I think we cannot just plug in [...]," since this procedure works perfectly fine in this problem. However, in this answer, I shall show you that you can still use the standard perturbation theory to deal with this differential equation.
I shall write $y=f^\epsilon$ for the solution to
$$y'(x)-\big(y(x)-1\big)^2=\epsilon\,\frac{\big(y(x)\big)^2}{x^2}\text{ and }y(1)=1\,.\tag{*}$$
Note that $f^0$ is the constant function $f^0\equiv 1$. Thus, for small $\epsilon$, we expect that $f^\epsilon$ is obtained from $f^0$ by via the perturbation theory. That is, for some functions $f_0,f_1,f_2,\ldots$, we have
$$f^\epsilon=f_0+\epsilon\,f_1+\epsilon^2\,f_2+\ldots\,,$$
where $f_0=f^0$. Observe that $f_k(1)=0$ for all $k=1,2,3,\ldots$.
The equation for $f_1$ is obtained by ignoring terms of oder $\epsilon^2$ or higher. From (*) and from $f_0=f^0\equiv1$, we get
$$\epsilon\,f_1'(x)=\frac{\epsilon}{x^2}\text{ or }f_1'(x)=\frac{1}{x^2}\,.$$
Consequently, $$f_1(x)=\int_1^x\,\frac{1}{t^2}\,\text{d}t=1-\frac{1}{x}\,.$$
The equation for $f_2$ is obtained by ignoring terms of order $\epsilon^3$ or higher. Since $f_0(x)=1$ and $f_1(x)=1-\frac{1}{x}$, we find that, from (*),
$$\left(\frac{\epsilon}{x^2}+\epsilon^2\,f'_2(x)\right)-\epsilon^2\left(1-\frac{1}{x}\right)^2=\frac{\epsilon}{x^2}\,\Biggl(1+2\,\epsilon\left(1-\frac1x\right)\Biggr)\,.$$
Therefore,
$$f_2'(x)-\left(1-\frac{1}{x}\right)^2=\frac{2}{x^2}\left(1-\frac1x\right)\,,\text{ or }f_2'(x)=1-\frac{2}{x}+\frac{3}{x^2}-\frac{2}{x^3}\,,$$
yielding
$$f_2(x)=\int_1^x\,\left(1-\frac{1}{t^2}\right)\,\text{d}t=(x-1)-2\,\ln(x)+3\,\left(1-\frac{1}{x}\right)-\left(1-\frac{1}{x^2}\right)\,.$$
You can continue so on and so forth, but the deeper you go, the more complicated the differential equation will be. However, for small $\epsilon$ and for $x$ near $1$, you can see that the approximation
$$y(x)\approx 1+\epsilon\,\left(\frac{(x-1)}{x}\right)+\epsilon^2\,\Biggl((x-1)-2\,\ln(x)+3\,\left(1-\frac{1}{x}\right)-\left(1-\frac{1}{x^2}\right)\Biggr)$$
gives an estimate of the solution to (*) very well (check this with a numerical solver or something alike). For $x$ further away from $1$ and for larger $\epsilon$, you are going to have to find more terms.
Best Answer
Side note: assumption of regular perturbation expansion in integer powers of the small parameter is a bit awkwardly restrictive. Assuming merely nonnegative powers of the small parameter allows considerably more flexibility and is still within the realm of regular perturbation theory.
Anyway, the two problematic cases you pointed out are problematic for entirely different reasons. The first case is because the perturbation itself is singular, which means that the $\epsilon=0$ problem has no solution, so the $y_0$ you would want to construct is already not defined. This generally occurs when setting the small parameter to zero eliminates the highest order term in the differential equation (because now the general solution should be expected to be "lower dimensional", either literally in the linear case or in a loose sense in the nonlinear case).
The second case is because you want to push the asymptotics to long time scales, but the local approximations you are making have unbounded error on unbounded time scales. On short times there is no problem with regular perturbation expansion, but on long times you need to come to grips with subtle dynamical issues such as periodicity and metastability, which are too complicated for regular perturbation theory to resolve.