Solved – Understanding multiple regression output

anovaregressionsas

I am a first year psychology student. I am doing some research work with a prof, unfortunately the material that I need to use right now is covered only in my second year. But I need to already know it now. So I am burning through any resources I can find to quickly come up to speed. I need help to understand this particular situation here. Involves SAS, Regression Analysis.

When I ran a regression in SAS ( proc reg ) using two variables say a and b. I got this. I understand this as saying that both these variables (a&b) do not significantly predict my target variable. Here is the SAS output.

                                     Analysis of Variance

                                            Sum of           Mean
        Source                   DF        Squares         Square    F Value    Pr > F

        Model                     2        3.32392        1.66196       1.00    0.3774
        Error                    46       76.80649        1.66971
        Corrected Total          48       80.13041


                     Root MSE              1.29217    R-Square     0.0415
                     Dependent Mean       -0.23698    Adj R-Sq    -0.0002
                     Coeff Var          -545.26074


                                     Parameter Estimates

                            Parameter       Standard                           Standardized
   Variable         DF       Estimate          Error    t Value    Pr > |t|        Estimate

   Intercept         1       -0.25713        0.18515      -1.39      0.1716               0
   a                 1       -0.35394        0.28797      -1.23      0.2253        -0.19510
   b                 1       -0.04706        0.39586      -0.12      0.9059        -0.01887

Now I tried to include the interaction of a and b into the picture. Lets call it aXb, now the out put indicates that a and aXb significantly predict my target variable.

                                     Analysis of Variance

                                            Sum of           Mean
        Source                   DF        Squares         Square    F Value    Pr > F

        Model                     3       16.64439        5.54813       3.93    0.0142
        Error                    45       63.48602        1.41080
        Corrected Total          48       80.13041


                     Root MSE              1.18777    R-Square     0.2077
                     Dependent Mean       -0.23698    Adj R-Sq     0.1549
                     Coeff Var          -501.20683


                                     Parameter Estimates

                            Parameter       Standard                           Standardized
   Variable         DF       Estimate          Error    t Value    Pr > |t|        Estimate

   Intercept         1       -0.06807        0.18098      -0.38      0.7086               0
   a                 1        3.01517        1.12795       2.67      0.0104         1.66201
   b                 1       -0.00994        0.36407      -0.03      0.9783        -0.00399
   aXb               1       -1.13782        0.37029      -3.07      0.0036        -1.90743

Here are my questions: I am not sure what to make out of this situation. Taken together what does this indicate to me? Also while you are answering the question, could you supplement it with some resources, goog keywords etc for me to learn more surrounding these topics.

Thank you so much for your help.

Best Answer

It seems like you need an introduction to regression. People made book recommendations here. Free book recommendations here.

It's hard to make sure you're doing the analysis right when we don't know what the variables are or what the goal is. But based on the output, I can tell you that your second regression specification looks better than your first. I say that because you have two highly significant coefficients, and the adjusted R^2 value took a big jump. Note though, although I consider these important clues, it is not true that models with more significant coefficients or higher adjusted R^2 are consistently better. There are lots of other issues to consider.

Your regression models are predicting Y, using a and b. In your second model, the estimated regression equation is -0.06807 + (3.01517 * a) - (0.00994 * b) - (1.13782 ab) In other words, plug in a and b, and you get the models prediction for Y. I could say a lot more, but I'll leave you there and suggest you pick up a textbook.

I strongly recommend you try plotting your data. Y with a on the x-axis, Y with b on the x-axis, and a by b as well.

Related Question