I am a first year psychology student. I am doing some research work with a prof, unfortunately the material that I need to use right now is covered only in my second year. But I need to already know it now. So I am burning through any resources I can find to quickly come up to speed. I need help to understand this particular situation here. Involves SAS, Regression Analysis.
When I ran a regression in SAS ( proc reg ) using two variables say a and b. I got this. I understand this as saying that both these variables (a&b) do not significantly predict my target variable. Here is the SAS output.
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 2 3.32392 1.66196 1.00 0.3774
Error 46 76.80649 1.66971
Corrected Total 48 80.13041
Root MSE 1.29217 R-Square 0.0415
Dependent Mean -0.23698 Adj R-Sq -0.0002
Coeff Var -545.26074
Parameter Estimates
Parameter Standard Standardized
Variable DF Estimate Error t Value Pr > |t| Estimate
Intercept 1 -0.25713 0.18515 -1.39 0.1716 0
a 1 -0.35394 0.28797 -1.23 0.2253 -0.19510
b 1 -0.04706 0.39586 -0.12 0.9059 -0.01887
Now I tried to include the interaction of a and b into the picture. Lets call it aXb, now the out put indicates that a and aXb significantly predict my target variable.
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 3 16.64439 5.54813 3.93 0.0142
Error 45 63.48602 1.41080
Corrected Total 48 80.13041
Root MSE 1.18777 R-Square 0.2077
Dependent Mean -0.23698 Adj R-Sq 0.1549
Coeff Var -501.20683
Parameter Estimates
Parameter Standard Standardized
Variable DF Estimate Error t Value Pr > |t| Estimate
Intercept 1 -0.06807 0.18098 -0.38 0.7086 0
a 1 3.01517 1.12795 2.67 0.0104 1.66201
b 1 -0.00994 0.36407 -0.03 0.9783 -0.00399
aXb 1 -1.13782 0.37029 -3.07 0.0036 -1.90743
Here are my questions: I am not sure what to make out of this situation. Taken together what does this indicate to me? Also while you are answering the question, could you supplement it with some resources, goog keywords etc for me to learn more surrounding these topics.
Thank you so much for your help.
Best Answer
It seems like you need an introduction to regression. People made book recommendations here. Free book recommendations here.
It's hard to make sure you're doing the analysis right when we don't know what the variables are or what the goal is. But based on the output, I can tell you that your second regression specification looks better than your first. I say that because you have two highly significant coefficients, and the adjusted R^2 value took a big jump. Note though, although I consider these important clues, it is not true that models with more significant coefficients or higher adjusted R^2 are consistently better. There are lots of other issues to consider.
Your regression models are predicting Y, using a and b. In your second model, the estimated regression equation is -0.06807 + (3.01517 * a) - (0.00994 * b) - (1.13782 ab) In other words, plug in a and b, and you get the models prediction for Y. I could say a lot more, but I'll leave you there and suggest you pick up a textbook.
I strongly recommend you try plotting your data. Y with a on the x-axis, Y with b on the x-axis, and a by b as well.