Solved – Basic structure of linear regression equation for s-shaped response curve

data transformationeconometricsregression

Can anyone enlighten me with the equation structure for the following basic theoretical (although common) scenario:

Sales, $S$, are related to "Advertising", $A$, such that when $A$ is small $S$ grows exponentially, and when $A$ is large, $S$ grows more slowly until at some point any increase in $A$ produces zero increase in $S$. So, a typical S-shaped response to advertising.

The total number of sales made in any period in the market is $T$. So our average market share works out as $\Sigma S/\Sigma T$. When advertising is zero, our sales are static at some level $s$ (i.e. the effect some other supporting influence other than advertising).

I know this can be solved through linear regression by transforming the variables, but I'm struggling to get my head around the most basic version of this – essentially $f(S) =\gamma+ \beta g(A)+\epsilon $, (with $\gamma$ being some intercept (possibly $0$) and $\epsilon$ being residuals) but what do $f$ and $g$ look like, and therefore what does the equation look like that I need to solve to estimate $S$?

EDIT

To show how the final equation would look, I have the logit transform in mind, so I'm looking for how the logit transform is applied using the parameters in the question, then what the final equation would look like with the transformations in place.

In addition, I'm specifically looking for a form to solve via linear regression rater than anything non-linear.

Best Answer

Possible S-shaped transformations are the logit ($log(x/(1-x))$) and the complementary log-log ($log(-log(x))$) to name a couple. See https://en.wikipedia.org/wiki/Sigmoid_function for more.

In your case, it is hard to say whether to transform the outcome ($S$) or the predictor ($A$) without seeing the data. If you'd start with transforming the predictor using logit transformation and then fitting your model, the final regression formula would look like this:

$S=γ+β*log(A/(1-A))+ϵ$ *

*Note that the logit and cloglog transformation will have trouble with data outside the $[0,1]$ range. This happens for most sigmoid functions. To use these transformations, you will need to transform the data if the data range is outside of the $[0,1]$. By suggestion of OP:

$A$ can be brought to the $[0,1]$ range by: $A′=(A+1)/(max(A)+2)$

the full regression formula would then be:

$S=γ+β*log(A′/(1-A′))+ϵ$

$= γ+β*log(((A+1)/(max(A)+2))/(1-((A+1)/(max(A)+2))))+ϵ$

A more straightforward approach to this non-linear association between advertising and sales would be to use a spline function. That way you are not so much dependent on 'accidentally' picking a more or less properly fitting transformation and the preprocessing is not needed! Implementation of splines in regression models can be done in R using, for example, the rms package.