Solved – Back Transformation

back-transformation

If I had a response variable that was square-root transformed, and an explanatory variable that is log transformed, and I wished to back transform the model using the summary statistics below, such that Y ~ (X)^2, how would I interpret the meaning of the relationship between X and Y using the estimated Beta coefficient?
I thought it was interpreted as: "If there is a 1% increase in X, there is approximately a change of sqrt(2.1014))/100 units increase in Y."

    Residuals:
        Min      1Q  Median      3Q     Max 
    -37.051 -12.096  -4.908   9.701  68.071 

    Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  -3.0147     2.0827  -1.448    0.148    
    Dose.Back     2.1014     0.1679  12.514   <2e-16 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 16.28 on 1154 degrees of freedom
Multiple R-squared:  0.1195,    Adjusted R-squared:  0.1187 
F-statistic: 156.6 on 1 and 1154 DF,  p-value: < 2.2e-16

Best Answer

Your model and its estimates posit that

$$\sqrt{Y} = 2.1014 D - 3.0147 + \varepsilon$$

where $D$ is Dose.Back (or its logarithm) and $\varepsilon$ is a random variable of zero expectation whose standard deviation is approximately $16.28.$ Squaring both sides gives

$$Y = (2.1014 D - 3.0147 + \varepsilon)^2.$$

Adding $0.01$ to $D$ yields the value

$$(2.1014 (D + 0.01) - 3.0147 + \varepsilon')^2.$$

The difference is

$$2(2.1014 D - 3.0147 + \varepsilon)(\varepsilon' - \varepsilon + (0.01)(2.1014)) + (\varepsilon' - \varepsilon + (0.01)(2.1014))^2.$$

This expression, as well as its expectation, are complicated. Let us therefore focus on the simpler question of how the expectation of $Y$ varies with $D$. Note that

$$\eqalign{ \mathbb{E}(Y) &= \mathbb{E}\left(2.1014 D - 3.0147 + \varepsilon\right)^2 \\ &= (2.1014D - 3.0147)^2 + 2(2.1014D - 3.0147) \mathbb{E}(\varepsilon) + \mathbb{E}(\varepsilon^2) \\ &=(2.1014D - 3.0147)^2 + 0 + (16.28)^2. }$$

(This result is of considerable interest in its own right because it reveals the role played by the mean squared error in interpreting the relationship between $D$ and $Y$.)

When $0.01$ is added to $D$ the value of $\mathbb{E}(Y)$ increases by

$$2(2.1014)(2.1014D - 3.0147)(0.01) + 2.1014(0.01)^2.$$

The last term $2.1014(0.01)^2 \approx 0.0002$ is so small compared to the squared errors (with their typical value of $16.28$) that we may neglect it. In this case, to a good approximation, this fitted model associates an (additive) increase in $D$ of $0.01$ with an increase in $Y$ of

$$2(2.1014)(2.1014D - 3.0147)(0.01) = 0.0883176 D - 0.126.$$


When $D$ is the natural logarithm of some quantity $d$, a 1% multiplicative increase in $d$ causes a value of approximately $0.01$ to be added to $D$, because

$$\log(1.01 d) = \log(1.01) + \log(d) = \left(0.01 - (0.01)^2/2 + \cdots\right) + D \approx 0.01 + D.$$

If you used a logarithm to another base $b$, entailing $D = \log_b(d) = \log(d)/\log(b),$ then a 1% multiplicative increase in $d$ causes a value of approximately $(0.01)/\log(b)$ to be added to $D$, so everywhere "$0.01$" occurs in the preceding formulas you must use $(0.01/\log(b))$ instead.

Related Question