Solved – How to justify the choice of independent variables in multiple regression

factor analysislikertmultiple regressionvalidity

I am trying to measure the effect of atmospheric factors as smell or light (IV) on purchase behavior (DV). In total I have xx likert scales that contain 5 likert items and responses are coded from 1 to 5.

I am wondering which approach would be the best to show that my IVs have some relevance. Could someone check if my approach makes sense?

  1. Clean out the data (delete monotone, take values & outliers into account, check normality)
  2. Conduct a reliability test with Cronbach's alpha
  3. Construct validity (convergent and discriminant)
  4. Harman single factor test
  5. Factor analysis
  6. Check if the 5 assumptions about MR are met (linearity, normality etc.)

Do you think that this approach is sufficient in order to show that my model has some value?

Best Answer

Several points:

I wouldn't check normality on likert items, since they are not normal.

Factor analysis methods consider the relationship between the independent variables you have (light, air) and have nothing to say about their relationship to the response (purchasing).

For the sort of data that you seem to have, I would try a regression tree as a useful exploratory method. Regression methods look for relationship that are linear across the full range of values, but trees could model that, say:

  1. men purchase differently from women
  2. men are affected by light
  3. women are affected by freshness of air

I also recommend plotting the data. You could binarize the likert items and do boxplots of the response against "high" or "low" ... or simply do a boxplots against all five likert categories.