Solved – Why do coefficients and significance levels change so much in the OLS? Do I need MIXED model instead

least squaresmixed modelmulticollinearityregressionspss

Please consider the following OLS model:

enter image description here

As a rule of thumb, a VIF score of max. 5 is considered acceptable collinearity in most fields. So for the better part I believed this model to be quite nice and significant (R-squared .516).

However, by adding only 2 variables (at the bottom of the model), many values change significantly. For instance, GCI-505 goes from being significant at the 10% level (p=.008) to being insignificant (p=.677). INCID reacts similarly, whereas HOF_PDI and HOF_LTO have a reverse change:

enter image description here

My question is: what does this mean for my model? It doesn't seem very robust, although I though that it should have been with a significant F-score, and not too high VIF scores. What am I missing / lacking in knowledge?

[edit 1]

Some explaining of the variables: the model tries to explain the extent to which a company is behaving socially responsible. The dataset holds 4,000 companies from 35 countries.

First, each company has a CSR score (the DV), a marketvalue and bookvalue (MKT_AVG_LN and PTBV_AVG_LN), a number of treaties on responsible investment they have signed (SIGN), and a number of incidents that have been reported in the media (INCID).

Second, each company is headquartered in one of 35 countries. These countries all have certain properties, for instance from which legal family their legal system stems and the average level of education. These country-specific values have been included as a variable for each company. For instance, a company with LAW_FRA-dummy set to 1 is headquartered in a country with a french legal system. When a company has LAW_SR set to 5.45, this means that the shareholder rights index for the country the company in headquartered in, is 5.45. LAW_FRA, LAW_SCA and LAW_ENG (not included in the model) are dummies for legal family. The rest of the variables are continuous.

There are possibly 2 problems with my dataset

  • I have "elevated" country-variables to a company level;
  • the number of companies from each country is very unequally divided (for instance, 40% comes from the USA whereas only 1% of companies is headquartered in Singapore. This means that if USA has a LAW_SR score of 5.45, 40% of companies in the dataset have a LAW_SR score of 5.45 whereas only a few companies will have a LAW_SR value of 3.00 (the LAW_SR score for Singapore).

[edit 2]
In SPSS Mixed Models has the following dialogue:

enter image description here

I am confused about which variables are factor, covariate and residual weight. As I believe continuous company-variables, such as a companies' market value and number of signed treaties (MKT_AVG_LN and SIGN) are factors, and company variables that are lifted from a country-level (for instance, LAW_SR, which is a numerical value of the relative strength of shareholders under company law of a country) are covariates. Is this correct?

Also, I have a few dummy variables. For instance, each company operates under either French, German, English or Scandinavian law. Here, LAW_FRA, LAW_GER, LAW_ENG, LAW_SCA are dummy variables indicating for each company the origin of company law in the country where the company is headquartered. These dummy values are derived from the variable LAW_FAMILY, which for each company is set to either FRA, GER, ENG or SCA. In mixed models, should I discard the dummy variables and use LAW_FAMILY as a covariate also?

Best Answer

The fact that additional variables make large changes in your model doesn't mean there is collinearity nor does it mean it isn't "robust" (although I guess that depends on what you mean by "robust"). Nor does a reasonable VIF and a high F mean that additional variables won't have an effect. Only completely independent variables will have no effect on the other coefficients, and variables can be a long way from independent without having high VIF.

It means that controlling for those additional variables makes a large difference in the other relationships.

You haven't said what any of these variables are, so it's hard to be specific. However, let's imagine simpler models:

Model 1: log(income) as effect of race/ethnic group Model 2: log(income) as effect of race/ethnic group + age

The coefficients in both models will likely all be significant (if it's a reasonable sample size). But the coefficient of race/ethnicity will (I bet) drop in the 2nd model because the average age of different ethnic groups is different, and income tends to be related to age.

EDIT in response to edit in post:

Since you have country level and company level variables, you should not use OLS regression as the data are not independent. You need to account for this. One way is with mixed models (aka multilevel models). I don't know how SPSS does this, but in SAS it would be PROC MIXED and in R it would be nlme or lme4.