Firstly, a method that first looks at univariate correlations for pre-identifying things that should go into a final model, will tend to do badly for a number of reasons: ignoring model uncertainy (single selected model), using statistical significance/strength of correlation as a criterion to select (if it is about prediction, you should rather try to assess how much something helps for prediction - these are not necessarily the same thing), "falsely" identifying predictors in univariate correlations (i.e. another predictor is even better, but because the one you look at correlates a bit with it, it looks like it correlates pretty well with the outcome) and missing out on predictors (they may only show up/become clear once other ones are adjusted for).
Additionally, not wrapping this into any form of bootstrapping/cross-validation/whatever to get a realistic assessment of your model uncertainty is likely to mislead you.
Furthermore, treating continuous predictors as having linear effects can often be improved upon by methods that do not make such an assumption (e.g. RF).
Using RF as a pre-selection for a linear model is not such a good idea. Variable importance is really hard to interpret and it is really hard (or meaningless?) to set a cut-off on it. You do not know whether variable importance is about the variable itself or about interactions, plus you are losing out on non-linear transformations of variables.
It depends in part of what you want to do. If you want good predictions, maybe you should not care too much about whether your method is a traditional statistical model or not.
Of course, there are plenty of things like the elastic net, LASSO, Bayesian models with the horseshoe prior etc. that fit better into a traditional modeling framework and could also accomodate e.g. splines for continuous covariates.
A more rigorous way to pursue this question is to apply the Boruta algorithm.
Boruta repeatedly measures feature importance from a random forest (or similar method) and then carries out statistical tests to screen out the features which are irrelevant. The procedure terminates when all features are either decisively relevant or decisively irrelevant.
There are several papers on this topic. Here's one. "The All Relevant Feature Selection using Random Forest" by Miron B. Kursa, Witold R. Rudnicki
Best Answer
Boruta and random forrest differences
Boruta algorithm uses randomization on top of results obtained from variable importance obtained from random forest to determine the truly important and statistically valid results. For details of the difference please refer to Section 2 of the article:
Kursa, Miron B., and Witold R. Rudnicki. "Feature selection with the Boruta package." (2010).
Is one method preferred over the other? If so why?
This is a classic case of "No Free Lunch" theorem. Without data and assumptions, it is impossible to decide which one is better. However, please note
Boruta
is produced as an improvement overrandom forest
variable importance. So, it should perform better in more situations than not (Biased because I like randomization techniques myself). Nevertheless, data and computational time could make variable importance from random forest a better choice.