Solved – What are the software limitations in all possible subsets selection in regression

model selectionmultivariableregression

If I have a dependent variable and $N$ predictor variables and wanted my stats software to examine all the possible models, there would be $2^N$ possible resulting equations.

I am curious to find out what the limitations are with regard to $N$ for major/popular statistic software since as $N$ gets large there is a combinatorial explosion.

I've poked around the various web pages for packages but not been able to find this information. I would suspect a value of 10 – 20 for $N$?

If anyone knows (and has links) I would be grateful for this information.

Aside from R, Minitab, I can think of these packages SAS, SPPS, Stata, Matlab, Excel(?), any other packages I should consider?

Best Answer

I suspect 30--60 is about the best you'll get. The standard approach is the leaps-and-bounds algorithm which doesn't require fitting every possible model. In $R$, the leaps package is one implementation.

The documentation for the regsubsets function in the leaps package states that it will handle up to 50 variables without complaining. It can be "forced" to do more than 50 by setting the appropriate boolean flag.

You might do a bit better with some parallelization technique, but the number of total models you can consider will (almost undoubtedly) only scale linearly with the number of CPU cores available to you. So, if 50 variables is the upper limit for a single core, and you have 1000 cores at your disposal, you could bump that to about 60 variables.

Related Solutions

Model Selection – Using All Possible Subsets and Automatic Selection Techniques for Regression

For the second part, you must interpret the output as the steps towards your final model.

For example, in the forward case you begin with Start: AIC=377.95 cars$MidrangePrice ~ 1

              Df Sum of Sq    RSS    AIC
+ cars$Horsepower  1    4979.3 3054.9 300.66
+ cars$Wheelbase   1    3172.3 4862.0 338.76
+ cars$Length      1    2448.8 5585.4 350.14
+ cars$Width       1    1969.2 6065.0 356.89
+ cars$Uturn       1    1450.2 6584.0 363.63
+ cars$Luggage     1    1079.6 6954.7 368.12
<none>                         8034.2 377.95

Your current model is only considering the constant cars$MidrangePrice ~ 1.

Each row in the table indicates that in case you add that variable (for example, Horsepower), you will get the following results rearding Sq RSS(Residual Sum of Squares) and AIC (Akaike Information Criterion).

In the other cases you must read the results the same way.

Hope this helps :)

Solved – Is it possible to conduct a regression if all variables are ordinal

Yes, it is possible.

When your dependent variable is ordinal, you want to do ordinal logistic regression. This can be done in SPSS. UCLA's excellent statistics help website has a guide to OLR in SPSS here (with more here).

Regarding your independent variables, you have several options:

You can represent them with a standard dummy coding scheme (such as reference cell coding, see my answer here for an explanation).
Another approach is to use an ordinal dummy coding scheme (such as difference coding, there is an explanation here).
Lastly, Agresti has argued that you can simply replace the ordinal rankings with continuous values that represent your best guesses about the true values. There will naturally be some measurement error associated with this approach, but if you have some knowledge on which to base your guesses they won't be too bad, and you won't use as many degrees of freedom to estimate the effect.

If you use OLR for your analysis, you can get tests of each variable with standard output. In SPSS these tests are reported in the "Parameter Estimates" table. The assumption you need to worry about / check is the proportional odds assumption, which is assessed via the "Test of Parallel Lines". SPSS can output this for you as well. UCLA's guide to OLR in SPSS (linked above) covers both of these issues.

Best Answer

Related Solutions

Model Selection – Using All Possible Subsets and Automatic Selection Techniques for Regression

Solved – Is it possible to conduct a regression if all variables are ordinal

Related Question