Why insist on dyadic decomposition of cubes when you don't need it?
Observe that the maximal function $\mathcal{M}f = \mathcal{M} |f|$ by definition. And observe that since $\phi$ is positive, $|\phi_t\ast f| \leq \phi_t \ast |f|$. Hence we can assume without loss of generality that $f$ is positive.
Let $\lambda_\phi(s)$ for $s \geq 0$ be the set $\{ \phi(x) \geq s\}$. We have that $\phi(x) = \int_0^P \chi_{\lambda_{\phi}(s)}(x) \mathrm{d}s$, where $P = \sup \phi$. Note that by assumption $\lambda_\phi(s)$ for a decreasing family of balls around the origin. Let $\lambda^*\phi(s)$ be the ball whose radius is 1 more than the radius of $\lambda_\phi(s)$. Now, let $|y-x| < 1$. We have that
$$ \int_{\lambda_\phi(s)} f(y-z) \mathrm{d}z \leq \int_{\lambda^*_\phi(s)} f(x - z) \mathrm{d}z $$
since $f$ is positive and $x + \lambda^*_\phi(s) \supset y + \lambda_\phi(s)$.
Let $P'$ be the smallest number such that $\lambda_\phi(P')$ has radius at most 1. By assumption (that $\phi$ is a bounded integrable function) we have that $\lambda_\phi(P')$ has positive radius $R'$.
$$\begin{align} \phi\ast f(y) &= \int_0^{P'} \int_{\lambda_\phi(s)} f(y-z) \mathrm{d}z \mathrm{d}s + \int_{P'}^P \int_{\lambda_\phi(s)} f(y-z) \mathrm{d}z \mathrm{d}s \\
&\leq \int_0^{P'} \int_{\lambda^*_\phi(s)} f(x-z) \mathrm{d}z \mathrm{d}s + \int_{P'}^P\int_{\lambda^*_\phi(P')} f(x-z) \mathrm{d}z \mathrm{d}s \\
& \leq \int_0^{P'} |\lambda_\phi^*(s)| \mathcal{M}f(x) \mathrm{d}s + |P' - P||\lambda_\phi^*(P')|\mathcal{M}f(x)
\end{align}$$
Now using that for $s \leq P'$ we have that $|\lambda_\phi^*(s)| \leq c^n |\lambda_\phi(s)|$ where $c = 1 + 1/R'$ we have
$$ \leq \mathcal{M}f(x)\left( c^n \int_0^P |\lambda_\phi(s)| \mathrm{d}s + |P-P'| |\lambda_\phi^*(P')|\right)\tag{*}$$
The first term inside the parenthesis gives $\int \phi(z) \mathrm{d}z$ from the distributional function characterisation of Lebesgue spaces. The second term is quite obviously a finite constant depending on $\phi$.
Now, if we replace $\phi$ by $\phi_t$ in the above argument, then $P \mapsto t^{-n} P$. We will need to consider $|y-x| < t$ and we let $\lambda^*_{\phi_t}(s)$ to have radius $t$ more than its counterpart without the star. We also let $P'$ be such that the corresponding ball has radius at least $t$: by the scaling property of $\phi_t$ we see that $P' \mapsto t^{-n} P'$, and $\lambda_\phi(P') \to t \lambda_\phi(P')$. So the above analysis goes to show that the constant derived above (the term inside the parentheses in (*)) does not depend on the scaling $t$. Hence we get the desired inequality.
Not only can it be taken to be countable, it must be countable!
Since $\mathbb{R}^d$ is separable (has a countable dense set, namely $\mathbb{Q}^d$), any family of pairwise disjoint (nonempty) open sets is countable. (Each open set in the family has nonempty intersection with $\mathbb{Q}^d$, no two distinct sets in the family can contain the same point of $\mathbb{Q}^d$.)
Best Answer
Well, I am almost 2 years late, but I figured I could show my answer. I didn't quite follow the selection procedure in @ydx 's answer.
We need to consider $2+\epsilon$ dilation because we will need $2$ radii to get to a centre and an $\epsilon$ more to cover the ball around said centre. Fix $0<\epsilon<1$.
Around each $x\in K$, there's a ball of radius $r_x$. Cover $K$ by $B(x, \epsilon r_x)$ and obtain a finite subcover $B(x_1, \epsilon r_{x_1}), \dots, B(x_n, \epsilon r_{x_n})$.
Now, from the collection $B(x_i, r_{x_i}), 1\leq i\leq n$ obtain a disjoint subcollection by choosing the largest radius at each step of the selection (similar to the construction in the finite Vitali covering). Let this subcollection be $B(x_1, r_{x_1}), \dots, B(x_m, r_{x_m})$ with $r_{x_1}\geq r_{x_2}\geq\dots\geq r_{x_m}$ (by construction).
Now, for $j>m, B(x_j, r_{x_j})$ intersects some ball in the subcollection, say $B(x_1, r_{x_1})$. By construction $r_{x_1}\geq r_{x_j}$. Therefore, $B(x_1, 2r_{x_1})$ contains the centre $x_j$ and going an $\epsilon$ further will contain $B(x_j, \epsilon r_{x_j})$, i.e., $$B(x_1, (2+\epsilon)r_{x_1})\supseteq B(x_j, \epsilon r_{x_j}).$$
Since $\epsilon<1$, the collection $B(x_i, (2+\epsilon)r_i)$ covers the union of $B(x_i, \epsilon r_i)$, hence covers $K$.