The key thing is that you need to be working with canonically normalized fields in order to use the power counting arguments.
Let's expand GR around flat space
\begin{equation}
g_{\mu\nu} = \eta_{\mu\nu} + \tilde{h}_{\mu\nu}
\end{equation}
The reason for the tilde will become clear in a second. So long as $\tilde{h}$ is "small" (or more precisely so long as the curvature $R\sim (\partial^2 \tilde{h})$ is "small"), we can view GR as an effective field theory of a massless spin two particle living on flat Minkowski space.
Then the Einstein Hilbert action takes the schematic form
\begin{equation}
S_{EH}=\frac{M_{pl}^2}{2}\int d^4x \sqrt{-g} R = \frac{M_{pl}^2}{2} \int d^4x \ (\partial \tilde{h})^2 + (\partial \tilde{h})^2\tilde{h}+\cdots
\end{equation}
where $M_{pl}\sim 1/\sqrt{G}$ in units with $\hbar=c=1$. $M_{pl}$ has units of mass. In this form you might thing that the interaction $(\partial \tilde{h})^2 \tilde{h}$ comes with a scale $M^2_{pl}$ with a positive power. However this is too fast--all the QFT arguments you have seen have assumed that the kinetic term had a coefficient of -1/2, not $M_{pl}^2$. Relatedly, given that $M_{pl}$ has units of mass and the action has units of $(mass)^4$, the field $\tilde{h}$ is dimensionless, so it is clearly not normalized the same way as the standard field used in QFT textbooks.
Now classically, the action is only defined up to an overall constant, so we are free to think of $M_{pl}^2$ as being an arbitrary constant. However, in QFT, the action appears in the path integral $Z=\int D\tilde{h}e^{iS[\tilde{h}]/\hbar}$ (note the notational distinction between $\tilde{h}$ and $\hbar$). Thus the overall constant of the action is not a free parameter in QFT, it is fixed and has physical meaning. Alternatively, you have to remember that the Einstein Hilbert action will ultimately be coupled to matter; when we do that, the scale $M_{pl}$ sitting in front of $S_{EH}$ will not multiply the matter action, and so $M_{pl}$ sets the relative scale between the gravitational action and the matter action.
The punchline is that we can't simply ignore the overall scale $M_{pl}^2$, it has physical meaning (ie, we can't absorb $M_{pl}$ into an overall coefficient multiplying the action). On the other hand, we want to put the action into a "standard" form where the overall scale isn't there, so we can apply the normal intuition about power counting. The solution is to work with a "canononically normalized field" $h$, related to $\tilde{h}$ by
\begin{equation}
\tilde{h}_{\mu\nu} = \frac{h_{\mu\nu}}{M_{pl}}
\end{equation}
Then the Einstein Hilbert action takes the form
\begin{equation}
S_{EH} = \int d^4 x \ (\partial h)^2 + \frac{1}{M_{pl}} (\partial h)^2 h + \cdots
\end{equation}
In this form it is clear that the interactions of the form $(\partial h)^2 h$ have a "coupling constant" $1/M_{pl}$ with dimensions 1/mass, which is non-renormalizable by power counting in the usual way.
Indeed, when we say a theory is renormalizable, it is tacitly assumed that there is only one perturbative expansion of the theory that we know how to do. When this is not the case, renormalizability is really a property of the parameter we are perturbing in. To say that the $\frac{1}{N}$ expansion is renormalizable means that a finite number of counter-terms added to the action will be enough to absorb all divergences. The $\lambda$ expansion being non-renormalizable, on the other hand, means that as we go to arbitrarily high powers of $\lambda$, the number of counter-terms one must add will grow without bound.
Aside
The trick that makes it easy to see why the two approaches can be so different is called the Hubbard-Stratonovich transformation. This introduces an auxiliary field in order to get a more explicit dependence on $N$. Namely a $\frac{1}{\sqrt{N}} \sigma \phi_i \phi_i$ vertex. Once you use this to build Feynman diagrams, it is no longer necessarily true that loop diagrams are higher order than tree diagrams. A $\phi$ loop in the $\sigma$ propagator, for instance, will involve $\delta^i_j \delta^j_i = \delta^i_i = N$ which cancels the extra two vertices and gives a diagram which is just as important as the one without the $\phi$ loop.
Best Answer
Let $D$ be the spacetime dimension. The action is dimensionless for any $D$ (since the action has units of $\hbar$, and we set $\hbar=1$)., Since $S = \int {\rm d}^D x \mathcal{L}$, and the volume element ${\rm d}^D x$ has mass dimension $-D$, this means the Lagrangian $\mathcal{L}$ has dimension $D$
Assuming we have a weakly coupled scalar field theory, the scaling dimension of the field $\phi$ will be determined by the kinetic term, $\mathcal{L} \sim (\partial \phi)^2$ (if you like, in the free theory only the kinetic term and maybe mass term are there, so in the free theory these determine the scaling of the field, and then perturbative quantum corrections will only lead to small changes to the free theory mass dimension). Since derivatives have mass dimension $1$ in any dimension, and the Lagrangian has mass dimension $D$, in order for things to work, the field must have dimensions $(D-2)/2$. You can check that in $D=4$, this works out to say that the field should have dimension $1$, which is the case.
Then we can consider a general operator (term in the Lagrangian) of the form \begin{equation} \mathcal{L} \sim \lambda \partial^{n_d} \phi^{n_\phi} \end{equation} where $\lambda$ is a (possibly dimensionful) coupling constant; $n_d$ is the number of derivatives; and $n_\phi$ is the number of powers of $\phi$. Then the dimension of $\lambda$ is \begin{equation} \Delta_\lambda = D - n_d - n_\phi \frac{D-2}{2} = D + \left(1-\frac{D}{2}\right)n_\phi - n_d \end{equation}
For $D=4, n_\phi=4, n_d=0$, this yields $\Delta_\lambda=0$, as you expect.
For an arbitrary $D$, with $n_\phi=4, n_d=0$, we have \begin{equation} \Delta_\lambda = 4 - D \end{equation} which is negative for all $D>4$; in other words, $\phi^4$ theory is power-counting non-renormalizable for all $D>4$.
Since we set up the formalism, we might as well look at a general interaction. Let's set $n_d=0$. Then the expression is \begin{equation} \Delta_\lambda = D + \left(1-\frac{D}{2}\right)n_\phi \end{equation} Then...
Since $n_d$ contributes negatively to $\Delta_\lambda$, any interactions with derivatives can only possibly be renormalizable if the derivative-less version is.