Solved – Interpreting significance of the intercept in a regression analysis

interceptinterpretationregression

For my thesis, I'm conducting several linear regression models. In total I have 15 dependent variables, so in my appendix I have 15 regression tables including 4 models. Example:

figure 1

I'm trying to figure out what I should report in the text. For now I choose to discuss all the models that include significant values. (except for the constant/intercept because this one is almost always significant).

  1. What does it mean when the constant/intercept value is significant? and if all the other values are not what does your model say/predict?

    Example:

    figure2

  2. What does it mean when the model and some regressors are significant, but the constant/intercept is not? (How) should I report about this in my result section?

    Example:

    figure 3

Best Answer

Quoting from the answer on the page suggested by mdewey in a comment:

The intercept is the estimated value of the response variable for the first modalities of each factor under the assumption of additivity.

So how does that apply to your data? It depends on what your software deems to be the "first modality," or reference value, of each of your predictors.

When there is a categorical predictor, like gender, some programs choose the first listed category as the reference, others choose the last. You need to know how your statistics program makes the choice, or specify directly which category to use as the reference.

For a continuous predictor, like age, a value of 0 is typically the reference. That can lead to some statistically "significant" intercept values (that is, values significantly different from 0) that have limited practical importance. If the age range of your participants is from 25 to 50 years old, does it really make sense to extrapolate your results all the way down to the age 0 of a newborn? That, nevertheless, is what the calculation of the intercept will do unless you take additional precautions.

One way around that problem is to use the difference in ages from the mean age, rather than the absolute age, as the independent variable in your model. That makes the mean age of the participants the reference value for age, which probably makes a lot more sense. The coefficient for age will not change, but the intercept would be more readily interpreted.

You can see this issue in your tables. I'll assume that your "first modalities," or reference values, are: gender, male; topic, not feminine; group composition, not female dominated; age, 0. Then the intercept for Model 4 for Total Speaking Time, 407 (minutes?), would be that predicted for a newborn male in a group not dominated by females speaking on a non-feminine topic. Does that type of prediction make any sense? That's why you have to think carefully about whether the significance of the intercept (whether it's different from 0) in any particular model really matters; that question is best answered based on your knowledge of the subject matter.

One additional warning: your breaking down the analyses into 4 separate models for each dependent variable is not best practice. Your Model 4 seems to include all the predictors of interest, and appropriate statistical tests on Model 4 alone would address your underlying question about how age, gender, topic, and group composition affect these dependent variables.

Related to that, you are thus doing many more tests of statistical significance than you need, leading a a potentially exacerbated problem with multiple comparisons. Among your 4 models you seem to be examining 16 different individual coefficients (including intercepts) for each of 15 dependent variables, or 240 statistical tests on coefficients. If you accept p < 0.05 (= 1/20) as "significant," then even if there were no truly significant relations you would nevertheless expect to accept 12 (= 240/20) coefficients as "significant." You should see if you can get some local statistical consultation to help address the multiple comparison problem and how to structure your models appropriately.