Solved – Controlling for age in multiple regression

ageinterpretationmultiple regression

I am a little stuck on what how to implement and interpret a multiple regression while controlling for age.

I am interested in seeing if there is a positive relationship between depression and use of music for emotional regulation and if this relationship is consistent across adulthood. I have three variables, depression, age and emotional usage of music. All three are continuous data. I have a large data set (2000+) but nearly half the sample are young adults (18-24). There are also very few seniors (65+).

As all the data is continuous and I have two IV's and one DV, multiple regression was the go to. However I am trying to understand how I could make it work and how I could interpret the outcome (I am very new to multiple regression). I have heard that would could hold one variable, like age, constant? How would I do this? Also how much does it matter that younger ages are over represented and the older ages were under represented. Will this bias the data/result?

Any advice would be greatly appreciated!

Best Answer

Simply adding your age term is equivalent to holding age constant. Let's say you are looking at $\beta_1$, the coefficient on $X_1$. It would be interpreted as "On average, all else equal, a unit increase in $X_1$ is associated with an increase of $\beta_1$ in $Y$."

You could also add an interaction term. Let age be denoted by $X_2$. If you include the term $X_1X_2$ in your regression equation, then the coefficient on $X_1X_2$ is telling you, on average, how much the slope $\beta_1$ (the coefficient on $X_1$) increases per unit increase in age. If the coefficient on $X_1X_2$ is not significantly different from 0, then there isn't enough evidence to say that the effect of $X_1$ on $Y$ varies with age. Note: if you add in $X_1X_2$, be sure to have $X_1$ and $X_2$ in your model.

You could plot the residuals of your model against age - if you see an increase or decrease in variance associated with age, then you have non-constant error variance, and the assumptions of your regression model are violated.

See Kutner et. al. Applied Linear Statistical Models. Fifth Edition. Chapter 6, pp. 236-248 (sorry for the poorly formatted citation).

Related Question