There seems to be one single intercept 49.80842, whereas it would make sense to have two different intercepts
No, it usually wouldn't make sense to have two intercepts; that only makes sense when you have a factor with two levels (and even then only if you regard the relationship holding factor levels constant).
The population intercept, strictly speaking, is $E(Y)$ for the population model when all the predictors are 0, and the estimate of it is whatever our fitted value is when all the predictors are zero.
In that sense - whether we have factor variables or numerical variables - there's only one intercept for the whole equation.
Unless, that is, you're considering different parts of the model as separate equations.
Imagine that we had one factor with three levels, and one continuous variable - for now without the interaction:
For the equation as a whole, there's one intercept, but if you think of it as a different relationship within each subgroup (level of the factor), there's three, one for each level of the factor ($a$) -- by considering a specific value of $a$, we get a specific straight line that is shifted by the effect of $a$, giving a different intercept for each group.
But now let's consider the relationship with $a$. Now for each level of $a$, if $x$ had no impact, there'd be a very simple relationship $E(Y|a=j)=\mu_j$. There's one intercept, the baseline mean (or if you conceive it that way, three, one for each subgroup -- where the intercept would be the average value in that subgroup).
(nb It may be hard to see here but the means are not equally spaced; don't be tempted by this plot to think of $y$ as linear in $a$ considered as a numeric variable.)
But now if we consider $x$ does have an impact and look at the relationship at a specific value of $x$ ($x=x_0$), as a function of $a$, $E(Y|a=j)=\mu_j(x_0)$ -- each group has a different mean, but those means are shifted by the effect of $x$ at $x_0$.
So that would be one intercept (the black dot if it's the baseline group) ... at each value of $x$.
For each of infinite number of different values that $x$ might take, there's a new intercept.
So depending on how we look at it, there's one intercept, or three, or an infinite number... but not two.
Now if we introduce an $x:a$ interaction, nothing changes but the slopes! We still can conceive of this as having one intercept, or perhaps three, or perhaps an infinite number.
So how does this all relate to two numeric variables?
Even though we didn't have it in this case, imagine that the levels of $a$ were numeric and that the fitted model was linear in $a$ (perhaps $a$ is discrete, like the number of phones owned collectively by a household). [i.e. we're now doing what I said earlier not to do, taking $a$ to be numeric and (conditionally) linearly related to $y$]
Then we'd still have one intercept in the strict sense, the value taken by the model when $x=a=0$ (even though neither variable is 0 in our sample), or one for each possible value taken by $a$ (in our sample, three different values occurred, but maybe 0, 4, 5 ... are also possible), or one for each value taken by $x$ (an infinity of possible values since $x$ is discrete). It doesn't matter if our model has an interaction, it doesn't change that consideration about how we count intercepts.
So how do we interpret the interaction term when both variables are numeric?
You can consider it as providing for a different slope in the relationship between $y$ and $x$, at each $a$ (three different slopes in all, one for the baseline and two more via interaction), or you can consider it as providing for a different slope between $y$ and (the now-numeric) $a$ at each value of $x$.
Now if we replace this now numeric but discrete $a$ with a continuous variate, you'd have an infinite number of slopes for both one-on-one relationships, one at each value of the third variable.
You effectively say as much in your question of course.
are we constrained to expressing this with absurd scenarios, such as if we had cars with 1hp we would have a modified slope for the weight equal to (−8.21662+0.02785)∗1∗weight? Or is there a more sensible way to look at this term?
Sure there is, consider values more like the mean. So for a typical relationship between mpg and wt, hold horsepower at some value near the mean. To see how much the slope changes, consider two values of horsepower, one below the mean and one above it.
Where the variable-values aren't especially meaningful in themselves (like score on some Likert-scale-based instrument say) you might go up or down by a standard deviation on the third variable, or pick the lower and upper quartile.
Where they are meaningful (like hp) you can pick two more or less typical values (100 and 200 seem like sensible choices for hp for the mtcars data, and if you also want to look at something near the mean, 150 will serve quite well, but you might choose a typical value for a particular kind of car for each choice instead)
So you could draw a fitted mpg-vs-wt line for a 100hp car and a 150hp car and a 200 hp car. You could also draw a mpg-vs-hp line for a car that weighs 2.0 (that's 2.0 thousand-pounds) and 4.0 or (or 2.5 & 3.5 if you want something nearer to quartiles).
Best Answer
There's really no need to use any of the "reduced forms"; they are just different ways of combining the coefficients and predictors. All are correct. The "reduced forms" might help make it clearer that the association of
Investment1
onROI
depends on the level ofInvestment2
(the author's "reduced form") and that the association ofInvestment2
onROI
also depends on the level ofInvestment1
(your "reduced form").The usual model matrix for regression works with the original forms, with a column of 1s for the intercept, a column for each predictor's values individually, and a column for a product of predictor values for each interaction in the model. After all, an interaction is just a product of individual predictors.
So I'd suggest not to worry about writing any "reduced form" unless it helps your understanding in some way.