Solved – How to handle empty data in linear regression

missing datamultiple regressionspss

This question may seem like a duplicate but I haven't found quite another one that fits my case.

So I am trying to run a linear regression on some variables including 1 dichotomous IV (Tool) that is dummy coded. That variable defines if some participants used a tool or not. In the case in which they did use a tool, there is another IV (Quality) that holds data for how well they fared at using the tool. In the case they didn't use the tool, the IV (Quality) is empty as there was nothing to evaluate. I am not sure if the concept "missing" applies here as it's simply just not applicable for those participants.

SPSS is giving me a warning and I realized it was linked to the missing data in IV (Quality) as my regression works when I don't include that variable. But now I'm not sure how to conduct my analysis. Is there another way I should handle the missing variable or should I run the analysis another way altogether? (i.e. by trying to do more than one multiple regression or something along those lines?)

EDIT: The question provided in comment by EdM does help but I am not sure how I can link my Tool and Quality variables in SPSS? The instructions provided look more like command lines or another software and I'm very new to this. I looked around but could not find much regarding "indicators" or 0 and 1.

Best Answer

There's no need to "link" the variables, other than to provide values of 0 for Quality whenever Tool = 0 (where that means, as I understand the dummy variable in your question, that the tool was not used). You do, however, have to think carefully about what the regression coefficients mean with this variable coding.

Your model will then have an intercept, a coefficient for Tool, and a coefficient for Quality. If the regression results are provided in SPSS as they are for the default settings in R, then the intercept is the predicted outcome for cases where the tool was not used (and Quality was zero, which is always the case when the tool was not used if you proceeded as recommended). The sum of the intercept and the coefficient for Tool is the predicted outcome for cases where the tool was used but Quality = 0. The coefficient for Quality is how much to increase above that last predicted outcome per unit change in Quality; that will only apply to cases where the tool was used, as all other cases will have Quality of 0.

Related Question