The constant $\alpha$ in your question can be in fact written explicitly as $(k)_n/12^n$, where $(a)_n=\Gamma(a+n)/\Gamma(n)$ is the Pochhammer symbol (shifted factorial) and $k$ denotes the (even) weight of the corresponding Eisenstein series.
Your observation is indeed related to the Rankin--Cohen brakets; see Section 5.2 in [D. Zagier, Elliptic modular forms and their applications, The 1-2-3 of modular forms, Universitext (Springer, Berlin, 2008), pp. 1–-103]. Preserving the notation $D$ of Zagier's lectures for your differential operator and picking a modular form $f$ of weight $k$, one can show that $D^nf$ transforms under the modular group as
$$
D^nf\biggl(\frac{a\tau+b}{c\tau+d}\biggr)
=\sum_{r=0}^n\binom{n}{r}\frac{(k+r)_{n-r}}{(2\pi i)^{n-r}}
c^{n-r}(c\tau+d)^{k+n+r}D^rf(\tau),
$$
by the induction on $n\ge 0$. In addition, the function $E_2(\tau)$ transforms as
$$
E_2\biggl(\frac{a\tau+b}{c\tau+d}\biggr)
=\frac{12c(c\tau+d)}{2\pi i}+(c\tau+d)^2E_2(\tau).
$$
Therefore, it remains to verify that the difference
$$
g_n=D^nE_k-\frac{(k) _ n}{12^n}\sum_{r=0}^n(-1)^{n-r}\binom{n}{r}E_{k+2n-2r}E_2^r
$$
satisfies
$$
g_n\biggl(\frac{a\tau+b}{c\tau+d}\biggr)=(c\tau+d)^{k+2n}g_n(\tau).
$$
Technicalities.
Indeed, I found the remaining details quite boring, but going through my yesterday
writing I have realised that your expectation fails already for $D^5E_4$ and $D^4E_6$.
Here is my explanation why.
Because $g_n(\tau)$ is a $q$-series, so it is invariant under $\tau\mapsto\tau+1$,
we can restrict to verifying the claim under the transformation $\tau\mapsto-1/\tau$
(that is, $a=d=0$, $b=-1$, and $c=1$). Then (we take $s=n-r$ in the above formula)
$$
D^nE_k(-1/\tau)
=\sum_{s=0}^n\binom ns\frac{(k+n-s) _ s}{(2\pi i)^s}\tau^{k+2n-s}D^{n-s}E_k(\tau)
$$
and
$$
\begin{aligned}
&
\frac{(k) _ n}{12^n}\sum_{r=0}^n(-1)^{n-r}\binom nrE_{k+2n-2r}E_2^r\bigg|_{\tau\mapsto-1/\tau}
\cr &\qquad
=\frac{(k) _ n}{12^n}\sum_{r=0}^n(-1)^{n-r}\binom nr\tau^{k+2n-r}E_{k+2n-2r}\biggl(\tau E_2+\frac{12}{2\pi i}\biggr)^r
\cr &\qquad
=\frac{(k) _ n}{12^n}\sum_{r=0}^n(-1)^{n-r}\binom nr\tau^{k+2n-r}E_{k+2n-2r}\sum_{s=0}^r\binom rs\frac{12^s}{(2\pi i)^s}\tau^{r-s}E_2^{r-s}
\cr &\qquad
=\frac{(k) _ n}{12^n}\sum_{s=0}^n\binom ns\tau^{r-s}\frac{12^s}{(2\pi i)^s}\tau^{k+2n-s}
\sum_{r=s}^n(-1)^{n-r}\binom{n-s}{r-s}E_{k+2n-2r}E_2^{r-s}.
\end{aligned}
$$
Subtracting the latter from the former we obtain
$$
\begin{aligned}
g_n(-1/\tau)
&=\sum_{s=0}^n\binom ns\frac{(k+n-s) _ s}{(2\pi i)^s}\tau^{k+2n-s}g_{n-s}(\tau)
\cr
&=\tau^{k+2n}g_n(\tau)+\sum_{s=1}^n\binom ns\frac{(k+n-s) _ s}{(2\pi i)^s}\tau^{k+2n-s}g_{n-s}(\tau).
\end{aligned}
$$
Therefore, $g_n(-1/\tau)=\tau^{k+2n}g_n(\tau)$, hence $g_n(\tau)$ is a modular form (of weight $k+2n$),
if and only if the additional sum over $s$ vanishes, that is, $g_{n-s}=0$ for $s=1,\dots,n$.
The latter however does not happen when $k+2n>12$.
You don't need the language of group theory to talk about some aspects of groups. For example, number theorists going back to Fermat were studying the group of units mod $m$ (including things like the order of a unit mod $m$) and geometers were studying groups of motions in space long before anyone defined a "group".
The first modular forms (of level $4$, not level $1$) were found by Gauss in his work on the arithmetic-geometric mean around 1800, and it would take until the end of the 19th century for the term "modular form" to be introduced, in 1890. I provided links for this near the end of my answer here, but I don't want to discuss this further.
Instead I want to show how one reason for interest in modular forms in the 19th century was the construction of nonconstant meromorphic functions on Riemann surfaces of higher genus. This will explain why someone might care about functions satisfying a transformation rule like $f((a\tau+b)/(c\tau+d)) = (c\tau+d)^kf(\tau)$.
Compact Riemann surfaces of genus 1 arise in the form $\mathbf C/L$ where $L$ is a discrete subgroup of $\mathbf C$. Meromorphic functions on $\mathbf C/L$ are the elliptic functions, which in various forms were studied throughout much of the 19th century (by Jacobi, Weierstrass, et al.). We'd like to find a similar story for compact Riemann surfaces of genus greater than $1$. By letting a discrete group $Γ \subset {\rm SL}_2(\mathbf R)$ act on the upper half-plane $\mathfrak h$ by linear fractional transformations, the coset space $\Gamma\setminus\mathfrak h$ will be a compact Riemann surface (with finitely many points missing) and the higher-genus Riemann surfaces arise in this way.
A natural collection of discrete subgroups of ${\rm SL}_2(\mathbf R)$ is ${\rm SL}_2(\mathbf Z)$ and its finite-index subgroups, such as the congruence subgroups. When I write $\Gamma$ below, you could consider it to be one of these groups of integral matrices.
How can we create nonconstant meromorphic functions on $\Gamma\setminus\mathfrak h$? That is the same thing as creating nonconstant meromorphic functions $F : \mathfrak h \rightarrow \mathbf C$ that are $\Gamma$-invariant: $F(\gamma\tau) = F(\tau)$ for all $\gamma \in \Gamma$ and $\tau \in \mathfrak h$. If we aren't clever enough to be able to write down $\Gamma$-invariant functions directly, we can still make progress by finding some non-invariant functions as long as they are non-invariant in the same way: if
$$
f\left(\frac{aτ + b}{cτ + d}\right) = (cτ + d)^kf(τ)
$$
and
$$
g\left(\frac{aτ + b}{cτ + d}\right) = (cτ + d)^kg(τ)
$$
for some "weight" $k \in \mathbf Z$ and all $(\begin{smallmatrix} a& b\\c &d\end{smallmatrix}) ∈ \Gamma$ and $\tau ∈ \mathfrak h$, then the ratio $f(τ)/g(τ)$ is $\Gamma$-invariant:
$$
\frac{f((a\tau+b)/(c\tau+d))}{g((a\tau+b)/(c\tau+d))} = \frac{(cτ + d)^kf(τ)}{(cτ + d)^kg(τ)} = \frac{f(\tau)}{g(\tau)}.
$$
As long as $f$ and $g$ are not constant multiples of each other (essentially, the space of modular forms $f$ and $g$ live in is more than one-dimensional), the ratio $f/g$ will be a nonconstant $\Gamma$-invariant function.
But why should we use fudge factors of the form $(cτ + d)^k$? Suppose for a function $f : \mathfrak h \rightarrow \mathbf C$ that $f(γτ)$ and $f(τ)$ are always related by a nonvanishing factor determined by $γ ∈ Γ$ and $τ ∈ \mathfrak h$ alone, not by $f$:
$$
f(γτ) = j(γ, τ)f(τ)
$$
for some function $j : \Gamma × \mathfrak h \rightarrow \mathbf C^\times$.
If also $g(γτ) = j(γ, τ)g(τ)$ then $f(γτ)/g(γτ) = f(\tau)/g(\tau)$, so $f/g$ is $\Gamma$-invariant. We want to figure out how anyone might imagine using the fudge factor $j(\gamma,\tau) = (c\tau+d)^k$, where $γ = (\begin{smallmatrix} a& b\\ c& d\end{smallmatrix})$. Since $(γ_1γ_2)τ = γ_1(γ_2τ)$, we have $f((γ_1γ_2)τ) = f(γ_1(γ_2τ))$. This turns
the above displayed equation into
$$
j(γ_1γ_2,τ)f(τ) = j(γ_1,γ_2τ)f(γ_2τ).
$$
Since $f(γ_2τ) = j(γ_2,τ)f(τ)$, the above equation holds if and only if the "cocycle condition"
$$j(γ_1γ_2, τ) = j(γ_1,γ_2τ)j(γ_2,τ)$$
holds, which looks a lot like the chain rule $(f_1 ◦ f_2)'(x) = f_1'(f_2(x))f_2'(x)$. This suggests a natural example of the function $j(\gamma,\tau)$ using differentiation: when
$γ = (\begin{smallmatrix} a& b\\ c& d\end{smallmatrix})$ and $\tau \in \mathfrak h$, set
$$
j(γ, τ ) := \left(\frac{aτ + b}{cτ + d}\right)' =
\frac{a(cτ + d) − c(aτ + b)}{(cτ + d)^2} =\frac{ad − bc}{(cτ + d)^2},
$$
and for $γ ∈ {\rm SL}_2(\mathbf R)$ this says $j(γ, τ) = 1/(cτ + d)^2$.
When $j(γ,τ)$ satisfies the cocycle condition, so does $j(γ,τ)^m$
for each $m \in \mathbf Z$, which motivates the consideration of the transformation rule for modular forms with factors $(cτ + d)^k$, at least for even $k$ (use $k = -2m$).
The top answer at https://math.stackexchange.com/questions/312515/what-is-the-intuition-between-1-cocycles-group-cohomology shows how the same cocycle condition arises when trying to write down line bundles on an elliptic curve over $\mathbf C$.
Best Answer
There is such a theory, but the analytic object that the forms live on is an analytified modular curve, not simply $\mathbb{C}_p$ (though there is a "$p$-adic upper-half plane" that can be used to uniformize some similar moduli spaces, but as far as I know not the usual modular curves).
Basically, if $f$ is a classical modular form of some weight $k$, $f$ can be realized as a section of a sheaf $\omega^{\otimes k}$ on a complex-analytic modular curve such as $X_1(N)$ obtained via quotient from the complex upper-half plane. These curves and sheaves have algebraic models defined over $\mathbb{Q}$, and (under mild hypotheses) the form $f$ actually arises from a section of the associated sheaf $\omega^{\otimes k}$ on the modular curve defined over $\mathbb{Q}$ (or some finite extension).
Now you can go in another direction and consider the rigid-analytic space over $\mathbb{Q}_p$ associated to the smooth algebraic curve $X_1(N)$ and your form $f$ gives rise to a $p$-adic analytic object on this curve. Now one can play games like considering subspaces obtained by removing disks around supersingular points to obtain general $p$-adic modular forms (such as limits of classical forms of varying weight) and overconvergent modular forms.
For what it's worth, these curves are $p$-adic analytic moduli spaces, and this point of view on $p$-adic modular forms essentially differs from Katz's by thinking about rigid spaces as opposed to formal schemes (the Raynaud point of view on $p$-adic analytic geometry).
Look at the papers of Coleman (such as his $p$-adic Banach spaces paper or the eigencurve paper with Mazur) if you would like to read more on this point of view.