I am struggling with a linear regression model of the shape $y = a + b_1\text{month} + b_2\text{year}$. I have 12 months for each year and 10 years. My dependent variable is a log transformed ratio. I have understood that much that when setting such a model up in R, R automatically picks a level for each variable to go into the intercept. March and 2005 goes into the intercept, in order to provide a baseline for comparison for the other factors.
As stated my problem is that i cannot really figure out what the intercept represents. Is it simply an average of the ratio of March and 2005 or what is it?
Residuals:
Min 1Q Median 3Q Max
-0.90339 -0.16789 -0.00373 0.15472 0.88338
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.586154 0.131642 -27.242 <2.00E-16 ***
MONTHJan 0.381735 0.140731 2.713 0.007875 **
MONTHFeb 0.256457 0.140731 1.822 0.071426 .
MONTHApr 0.072824 0.140731 0.517 0.605981
MONTHMay 0.207984 0.140731 1.478 0.142613
MONTHJun -0.008194 0.140731 -0.058 0.953686
MONTHJul 0.363693 0.140731 2.584 0.011217 *
MONTHAug 0.195791 0.140731 1.391 0.16727
MONTHSep 0.212562 0.140731 1.51 0.134124
MONTHOct 0.124234 0.140731 0.883 0.379495
MONTHNov 0.204009 0.140731 1.45 0.15032
MONTHDec 0.175348 0.140731 1.246 0.215711
YEAR1999 0.477663 0.128469 3.718 0.000333 ***
YEAR2000 -0.027343 0.128469 -0.213 0.83189
YEAR2001 -0.166637 0.128469 -1.297 0.197612
YEAR2002 -0.060508 0.128469 -0.471 0.638684
YEAR2003 -0.173492 0.128469 -1.35 0.179948
YEAR2004 0.003592 0.128469 0.028 0.977753
YEAR2006 -0.283261 0.128469 -2.205 0.029776 *
YEAR2007 -0.267752 0.128469 -2.084 0.03972 *
YEAR2008 -0.240654 0.128469 -1.873 0.063985 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3147 on 99 degrees of freedom
Multiple R-squared: 0.4167,
Adjusted R-squared: 0.2988
F-statistic: 3.536 on 20 and 99 DF, p-value: 1.491e-05
Best Answer
Peter's answer is correct. One note worth mentioning is that all other coefficients should be interpreted as marginal effects relative to baseline, which is March 2005.
So, for example, if we were to impute a fitted value for May 2008:
$$ \widehat{y} = \widehat{\alpha} + \widehat{\beta_1}*month + \widehat{\beta_2}*year$$
We know that "month = May" and "year = 2008", so we can plug in coefficients.
$$ \widehat{y_{may, 2008}} = -3.586154 + 0.207984 * 1 + -0.240654 * 1$$
Note that we use use "1" for the value of our independent variables, because they are indicator variables.
So, relative to baseline (March 2005), the ratio of your dependent variable will decrease by 3.267% because
$$ 0.207984 + -0.240654 = -0.03267 $$
When the model is specified with a log dependent variable and level independent variables, we can interpret the coefficients as follows: changing X by one unit changes Y by $ 100 * \beta_1 $ percent.
$$ \%\bigtriangleup Y = 100 * \beta_1$$