Indeed, the weak type estimate is useful. Using Fubini's theorem, we have
$$\int_E|f^{*}(x)|\mathrm dx=q\int_0^\infty t^{q-1}\lambda\{|f^*(x)|\chi_E\geqslant t\}\mathrm dt.$$
Notice that $$\lambda\{|f^*(x)|\chi_E\geqslant t\}\leqslant \min\left\{|E|;\frac{3^d}t\lVert f\rVert_{\mathbb L^1}\right\},$$
hence cut the integrals and conclude.
In the given range
$$\tag{1}\frac1p + 1 = \frac1q+\frac\alpha d, $$
there is no hope for a simpler proof in the non-homogeneous case.
To explain, let me consider the weak Young inequality
$$\tag{2}
\lVert f\ast g \rVert_p \le C\lVert f\rVert_{q_1} \lVert g\rVert_{q_2,\infty}, \quad \frac1p+1 = \frac1{q_1}+\frac1{q_2},$$
where
$$\tag{3}\lVert g\rVert_{q_2,\infty} = \sup_{t>0} t \lvert \{ \lvert g\rvert >t\}\rvert^\frac1{q_2}.$$
We want to prove (n-HLS) under the assumption (1), which, according to (2) with $g=\langle\cdot\rangle^{-\alpha}$, follows from
$$\tag{4}
\lVert \langle \cdot \rangle^{-\alpha}\rVert_{\frac d \alpha, \infty}<\infty.$$
Now, it is clear that there is no simpler way to prove (4) than to estimate pointwise $\langle x \rangle^{-\alpha}\le \lvert x \rvert^{-\alpha}$, so that $$\lVert \langle\cdot \rangle^{-\alpha}\rVert_{\frac d \alpha, \infty} \le \ \lVert \lvert\cdot \rvert^{-\alpha}\rVert_{\frac d \alpha, \infty}<\infty,$$
which completes the proof. Thus the main step is the proof that $\lVert \lvert\cdot \rvert^{-\alpha}\rVert_{\frac d \alpha, \infty}<\infty$, which yields (HLS) and (n-HLS) essentially at the same time.
No proof can rely on anything simpler than this.
Let me remark that the mathematics behind all this is essentially the Marcinkiewicz interpolation theorem. I have consulted the blog post of Terry Tao https://terrytao.wordpress.com/2009/03/30/245c-notes-1-interpolation-of-lp-spaces/; the weak Young inequality is Exercise 44.
In the proof that I gave in the main question above, the weak-Lp computations are there. They are just hidden behind the Hardy-Littlewood maximal estimate $\lVert Mf\rVert_p\le C \lVert f\rVert_p$, which indeed is proven via the interpolation theorem of Marcinkiewicz.
Best Answer
I will try to extend my comment to an answer. I will call this integral as $I$. Use Fubini's or Tonelli's Theorem to interchange the integrals, we get
$$I=2C_p\int_{\mathbb{R}^n}|f|\int_0^{\infty}t^{p-2}\chi_{ \{|f|>\frac{t}{2}\} }dtdx.$$
Now put $t^{p-1}=2^{p-1}u$
$=>(p-1)t^{p-2}dt=2^{p-1}du$. We get
$$I=\frac{2^pC_p}{p-1}\int_{\mathbb{R}^n}|f|dx\int_0^{\infty}\chi_{ \{|f|^{p-1}>u \} }du.$$
Use Layer cake representation of a function i.e.,
$$|f(x)|=\int_0^{\infty}\chi_{ \{|f|>t \} }dt.$$
We have,
$$I=\frac{2^pC_p}{p-1}\int_{\mathbb{R}^n}|f|^pdx.$$