Regression – Addressing Violations of Linear Regression Assumptions

anovaassumptionsregressionscatterplotspss

I have an independent variable hours/day sitting and a dependent variable test scores, I think all of the assumptions of linear regression are violated, as in the pictures that I added

So I did one way anova instead and I grouped the independent variable into two catagories (6-11 hours, and 12-17 hours). And I used Mann Whitney test instead of t test following the chart that I attached from this site: https://www.spss-tutorials.com/spss-independent-samples-t-test/
I added the results of Mann Whitney test
I also added the questionnaire that I used for the two dependent variables.

enter image description here

enter image description here

enter image description here

I'm not sure if this is the correct method because many authors say that grouping continuous independent variables is not a good method, which method should I use?

I have 6 more independent variables and I used one-way anova or t test or kruskal wallis test and Mann Whitney test separately according to violation of the assumptions.

Are these methods that I conducted correct? Sample size is 100
Thank you

enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

Best Answer

I have 6 more independent variables and I used one-way anova or t test or kruskal wallis test and Mann Whitney test separately according to violation of the assumptions.

Separate regressions or tests with each of the independent variables is not the best way to proceed. If you omit any independent variable that is associated with outcome and is correlated with variables in the model, then the regression coefficients you get risk being biased. With 100 cases you then have more than 14 cases per independent variable, so you should be able to fit them all together in a multiple regression without overfitting your data.

So far, what you show doesn't suggest there is an association between time sitting and your Boston score, although there are problems with each of the tests you show (as noted in comments). That might improve if you take other independent variables into account in a multiple regression. I wouldn't worry much yet about residual plots and so forth for this single-predictor regression; those might also improve when you take all of your predictors into account together.

If your outcome is the sum of 19 items, each with a 1-5 scale, then you should be able to treat that as continuous. If your full multiple regression model still shows problems you could consider ordinal logistic regression, which models the ordering of outcomes without requiring assumptions related to residuals.

Related Question