Solved – How is the intercept calculated in a regression model with two independent factor variables

rregressiontime series

I am struggling with a linear regression model of the shape $y = a + b_1\text{month} + b_2\text{year}$. I have 12 months for each year and 10 years. My dependent variable is a log transformed ratio. I have understood that much that when setting such a model up in R, R automatically picks a level for each variable to go into the intercept. March and 2005 goes into the intercept, in order to provide a baseline for comparison for the other factors.

As stated my problem is that i cannot really figure out what the intercept represents. Is it simply an average of the ratio of March and 2005 or what is it?

Residuals:                  
Min 1Q  Median  3Q  Max 
-0.90339    -0.16789    -0.00373    0.15472 0.88338 

Coefficients:                   
    Estimate    Std. Error  t   value   Pr(>|t|)
(Intercept) -3.586154   0.131642    -27.242 <2.00E-16   ***
MONTHJan    0.381735    0.140731    2.713   0.007875    **
MONTHFeb    0.256457    0.140731    1.822   0.071426    .
MONTHApr    0.072824    0.140731    0.517   0.605981    
MONTHMay    0.207984    0.140731    1.478   0.142613    
MONTHJun    -0.008194   0.140731    -0.058  0.953686    
MONTHJul    0.363693    0.140731    2.584   0.011217    *
MONTHAug    0.195791    0.140731    1.391   0.16727 
MONTHSep    0.212562    0.140731    1.51    0.134124    
MONTHOct    0.124234    0.140731    0.883   0.379495    
MONTHNov    0.204009    0.140731    1.45    0.15032 
MONTHDec    0.175348    0.140731    1.246   0.215711    
YEAR1999    0.477663    0.128469    3.718   0.000333    ***
YEAR2000    -0.027343   0.128469    -0.213  0.83189 
YEAR2001    -0.166637   0.128469    -1.297  0.197612    
YEAR2002    -0.060508   0.128469    -0.471  0.638684    
YEAR2003    -0.173492   0.128469    -1.35   0.179948    
YEAR2004    0.003592    0.128469    0.028   0.977753    
YEAR2006    -0.283261   0.128469    -2.205  0.029776    *
YEAR2007    -0.267752   0.128469    -2.084  0.03972 *
YEAR2008    -0.240654   0.128469    -1.873  0.063985    .
---                 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1                  

Residual standard error: 0.3147 on 99 degrees of freedom                    
Multiple R-squared: 0.4167,                 
Adjusted R-squared: 0.2988                  
F-statistic: 3.536 on 20 and 99 DF,  p-value: 1.491e-05 

Best Answer

Peter's answer is correct. One note worth mentioning is that all other coefficients should be interpreted as marginal effects relative to baseline, which is March 2005.

So, for example, if we were to impute a fitted value for May 2008:

$$ \widehat{y} = \widehat{\alpha} + \widehat{\beta_1}*month + \widehat{\beta_2}*year$$

We know that "month = May" and "year = 2008", so we can plug in coefficients.

$$ \widehat{y_{may, 2008}} = -3.586154 + 0.207984 * 1 + -0.240654 * 1$$

Note that we use use "1" for the value of our independent variables, because they are indicator variables.

So, relative to baseline (March 2005), the ratio of your dependent variable will decrease by 3.267% because

$$ 0.207984 + -0.240654 = -0.03267 $$

When the model is specified with a log dependent variable and level independent variables, we can interpret the coefficients as follows: changing X by one unit changes Y by $ 100 * \beta_1 $ percent.

$$ \%\bigtriangleup Y = 100 * \beta_1$$

Related Question