Here is one way of getting the variation of parameters formulas. Firstly, some motivation mixed with a bit of jargon:
We think of $a_{2}(x)y''(x)+a_{1}(x)y'(x)+a_{0}(x)y(x)$ as a result of applying to $y(x)$ an operation $D=a_{2}(x)\frac{d^2}{dx^2 }+a_{1}(x)\frac{d}{dx}+a_{0}(x) Id$. This is a linear operation (aka a linear differential operator), meaning that $D(c_1y_1(x)+c_2y_2(x))=c_1 D(y_1(x))+c_2 D((y_2(x))$. Our equation is then $D (y(x))=f(x)$.
Linearity ensures that if $y_a(x)$ and $y_b(x)$ solve $Dy_a=f_a$ and $Dy_b=f_b$ then $y_a+y_b$ solves $Dy=f_a+f_b$. So if we can decompose our inhomogeneous term $f(x)$ as a sum $f=f_a+f_b$ and solve the two "pieces" $Dy=f_a$ and $Dy=f_b$ we would be done - just add the two solutions and get a solution for $f$.
The key idea now (this is known as Green's function method, or perhaps sometimes Duhamel's principle) is to think of $f$ as being a "sum" of a continuum of delta functions $f(x)=\int f(t)\delta(x-t) dt$. Thus we want to solve $Dy(x)= \delta(x-t)$ for each $t$ (these solutions are the Green's functions for our equation), to get solutions $y_t(x)=y(t,x)$, and then write our solution to $Dy=f$ as the "sum" over t $y(x)=\int f(t) y_t (x) dt = \int f(t) y (t , x) dt $.
Continuing in this way (without paying much attention to technical details or introducing the theory of distributions needed to provide such details and make the above rigorous), we are now tasked with finding $y_t(x)$ i.e. solving $Dy_t(x)=\delta(x-t)$. The solution is of course not unique, but is only unique up to adding $y_h$ - an arbitrary solution of the homogeneous equation $Dy=0$. Note that in fact for $x>t$ and for $x<t$ we have $\delta(x-t)=0$, and our $y_t$ will coincide with some $y_h$. But it can not be the same $y_h$, or we won't get $\delta(x-t)$, just $0$ everywhere. So we need to jump at $x=t$. Namely, we can start with $y=0$ for $x<t$ and continue as some $y_h$ after.
Now, any $y_h$ is a combination of fixed homogeneous solutions $y_1$ and $y_2$, i.e. $y_h(x)=c_1y_1(x)+c_2y_2(x)$. So our Green's function $y_t=0$ for $x<t$ and $y_t(x)=c_1y_1(x)+c_2y_2(x)$ for $x>t$. What should $c_1, c_2$ be? We want $Dy_t(x)= a_{2}(x)y_t''(x)+a_{1}(x)y_t'(x)+a_{0}(x)y_t(x)= \delta(x-t)$. So $y_t$ should be continuous, so that $y_t'$ has only a step discontinuity at $x=t$, and $y_t''$ only a delta (and not $\delta'(x-t)$). Continuity imposes $c_1y_1(t)+c_2y_2(t)=0$. We also want to have a unit delta jump and not bigger or smaller. The size of the jump is controlled by $a_2 \frac{d^2}{dx^2}$ at $x=t$ and so is $a_2(t)[ c_1y'_1(t)+c_2y'_2(t)]$.
Equating it to 1 imposes $ c_1y'_1(t)+c_2y'_2(t)=\frac{1}{a_2(t)}$
To summarize:
The Green's function at $t$ is [$0$ for $x<t$, then $c_1(t)y_1(x)+c_2(t)y_2(x)$ for $x>t$]
with $c_1(t), c_2(t)$ subject to
$c_1(t)y_1(t)+c_2(t)y_2(t)=0$
$ c_1y'_1(t)+c_2y'_2(t)=\frac{1}{a_2(t)}$
Compare with the variation of parameters: $c_1=\frac{u_1'}{f}, c_2=\frac{u_2'}{f}$
And the solution to $Dy=f$ is
$y(x)=\int f(t) y_t (x) dt = \int f(t) y (t , x) dt $
Plugging in, we get
$y(x)=\int_{x>t} f(t) (c_1(t)y_1(x))+c_2(t)y_2(x)) dt=[\int_{-\infty}^{x} f(t)c_1(t)) dt] y_1(x)+[\int_{-\infty}^{x} f(t)c_2(t))dt] y_2(x)=u_1(x)y_1(x)+u_2(x)y_2(x)$.
So we recover $y$ as a combination of $y_1$ and $y_2$ with function coefficients, and we know exactly which relations the derivatives of these coefficients should satisfy (they have to combine to give Green's functions times $f(x)$). Writing out these relations we get the same formulas as for the variation of parameters formulas, which we can then justify independently in an ad hoc manner.
Of course this generalizes in a straightforward manner to higher order linear equations - the $c_i$s will have more linear `matching of derivatives' constraints to satisfy to give a Green's function, and the rest is the same.
Best Answer
Tunococ, no offense intended, but what you have said is given in any standard textbook on differential equations. As I said earlier, some ideas become clear after you go through with them, and this analysis is precisely that. That's not what I am after.
At this point I should clarify my comment on your answer. When solving a DE, our primary concern is separation of variables. Simplification is a secondary concern. A substitution which transforms our equation into something more complex, but separates the variables is more useful than one which simplifies our DE without separating the variables. As far as simplification is concerned, transforming, $f(x)$ into $f(y/x)$ and then substituting $y/x = v$ is a logical step (not a natural guess!). But with respect to separation of variables, it is not.
Johann Bernoulli is said to be the first to suggest this method.The following statement is attributed to him "I attempt only to separate the indeterminate $x$ and it's differential $dx$, from the indeterminates $y$ and $dy$, which deserves the prize in this investigation, for otherwise the construction of the solution to the differential equation won't be achieved." (Ref - A History of Analysis - Hans Niels Jahnke). So there's a strong probability that when JB suggested putting $y/x = v$, he was looking at separation of variables, not simplification of the DE.
Now I am not insisting that he discovered this substitution in a a logical manner. What you suggested could have been true. I am only saying that, if such a possibility exists, we should explore it.
You did raise an important point however, that I overlooked - homogeneous functions need not be polynomials.As for degree not being an integer, that can be changed via substitution.