A "moderator" affects the regression coefficients of $Y$ against $X$: they might change as values of the moderator change. Thus, in full generality, the simple regression model of moderation is
$$\mathbb{E}(Y) = \alpha(M) + \beta(M)X$$
where $\alpha$ and $\beta$ are functions of the moderator $M$ rather than constants unaffected by values of $M$.
In the same spirit in which regression is founded on a linear approximation of the relationship between $X$ and $Y$, we may hope that both $\alpha$ and $\beta$ are--at least approximately--linear functions of $M$ throughout the range of values of $M$ in the data:
$$\eqalign{
\mathbb{E}(Y) &= \alpha_0 + \alpha_1 M + O(M^2) + (\beta_0 + \beta_1 M + O(M^2))X \\
&= \alpha_0 + \beta_0 X + \alpha_1 M + \beta_1 MX + O(M^2) + O(M^2)X.
}$$
Dropping the nonlinear ("big-O") terms, in the hope they are too small to matter, gives the multiplicative (bilinear) interaction model
$$\mathbb{E}(Y) = \alpha_0 + \beta_0 X + \alpha_1 M + \beta_1 MX.\tag{1}$$
This derivation suggests an interesting interpretation of the coefficients: $\alpha_1$ is the rate at which $M$ changes the intercept while $\beta_1$ is the rate at which $M$ changes the slope. ($\alpha_0$ and $\beta_0$ are the slope and intercept when $M$ is (formally) set to zero.) $\beta_1$ is the coefficient of the "product term" $MX$. It answers the question in this way:
We model the moderation with a product term $MX$ when we expect the moderator $M$ will (approximately, on average) have a linear relationship with the slope of $Y$ vs $X$.
Of interest is that this derivation points the way towards a natural extension of the model, which might suggest ways to check goodness of fit. If you are not concerned with nonlinearity in $X$--you either know or assume that model $(1)$ is accurate--then you would want to extend the model to accommodate the terms that were dropped:
$$
\mathbb{E}(Y) = \alpha_0 + \beta_0 X + \alpha_1 M + \beta_1 MX + \alpha_2M^2 + \beta_2 M^2X.
$$
Testing the hypothesis $\alpha_2=\beta_2=0$ evaluates the goodness of fit. Estimating $\alpha_2$ and $\beta_2$ could indicate in what way model $(1)$ might need to be extended: to incorporate nonlinearity in $M$ (when $\alpha_2 \ne 0$) or a more complicated moderating relationship (when $\beta_2 \ne 0$) or possibly both. (Note that this test would not be suggested by a power series expansion of a generic function $f(X,M)$.)
Finally, if you were to discover that the interaction coefficient $\beta_1$ were not significantly different from zero, but that the fit is nonlinear (as evidenced by a significant value of $\beta_2$), then you would conclude (a) there is moderation but (b) it is not modeled by an $MX$ term, but instead by some higher-order terms beginning with $M^2X$. This might be the kind of phenomenon to which Kenny was referring.
This can happen when the two predictors both contain a large nuisance factor, but with opposite sign, so when you add them up the nuisance cancels out and you get something much closer to the third variable.
Let's illustrate with an even more extreme example. Suppose $X, Y \sim N(0,1)$ are independent standard normal random variables. Now let
$A = X$
$B = -X + 0.00001Y$
Say that $Y$ happens to be your third variable, $A, B$ are your two predictors, and $X$ is a latent variable you don't know anything about. The correlation of A with Y is 0, and the correlation of B with Y is very small, close to 0.00001.* But the correlation of $A+B$ with $Y$ is 1.
*There is a teeny tiny correction for the standard deviation of B being a bit more than 1.
Best Answer
I don't really understand the question. In some regression model $Y= f(x_1, \dotsc,x_p)+\epsilon$, all the variables on the RHS are needed to make predictions for $Y$, all are predictors. A moderator is a predictor that plays a specific role, that of modifying (interacting with) the effect of some other predictor. That does not make it any less a predictor itself.
But maybe I have misunderstood something?