It seems that you're comparing scikit-learn
's Random Forest with randomForest
package in R
, where this package deals with categorical variables automatically.
However in scikit-learn you have to preprocess your data yourself. To do this, you could use DictVectorizer class, which would create new binary features for every new value of your original feature.
It's easy with scipy.optimize.fmin, evaluating your model on sampled points or rolling your own optimization routine. This is the bread and butter of how many models work, so it's worth learning about in great detail. See for instance, these lecture notes.
This probably isn't what you want though. The $r^2$ you cross-validated is for the distribution of the input space. The $\operatorname*{arg\,max}_x f(x)$ that the random forest will evaluate to will probably be in a low confidence neighborhood, away from the support of the training data.
You want a model that generates confidence intervals. In Scikit-Learn, GaussianProcess and GradientBoostingRegressor both do this. Gaussian Processes are excellent for this problem if you don't have more than a thousand training observations.
If you can collect more data after consulting with your model, then you evaluate your black box at the $\arg\max$ of the upper confidence bound, add the result as a new data-point and repeat until there is no change.
This problem is known as the Contextual Bandit. The choice of confidence bound is dependent on the exploration until convergence/exploitation of the intermittent values that you want. Since you don't care about how poorly the model performs while training, you'd pick a large upper confidence bound so the model will converge faster.
There are several related toolkits for this sort of problem. Spearmint, Hyperopt, and MOE.
If you cannot collect more data then you should take the $\arg\max$ of the lower confidence bound of the model. This penalizes predictions that the model is uncertain of.
Best Answer
The way Random Forests are built is invariant to monotonic transformations of the independent variables. Splits will be completely analogous. If you are just aiming for accuracy you will not see any improvement in it. In fact, since Random Forests are able to find complex non-linear (Why are you calling this linear regression?) relations and variable interactions on the fly, if you transform your independent variables you may smooth out the information that allows this algorithm to do this properly.
Sometimes Random Forests are not treated as a black box and used for inference. For example, you can interpret the variable importance measures that it provides, or calculate some sort of marginal effects of your independent variable on your dependent variable. This is usually visualized as partial dependence plots. I'm pretty sure this last thing is highly influenced by the scale of the variables, which is a problem when trying to obtain information of a more descriptive nature from Random Forests. In this case it might help you to transform your variables (standardize), which could make partial dependence plots comparable. Not completely sure on this, will have to think on it.
Not long ago I tried to predict count data using a Random Forest, regressing on the square root and the natural log of the dependant variable helped a bit, not much, and not enough to let me keep the model.
Some packages with which you may use random forests for inference:
https://uc-r.github.io/lime
https://cran.r-project.org/web/packages/randomForestExplainer/index.html
https://pbiecek.github.io/DALEX_docs/2-2-useCaseApartmetns.html