Random Forest is a bagging algorithm rather than a boosting algorithm.
They are two opposite way to achieve a low error.
We know that error can be composited from bias and variance. A too complex model has low bias but large variance, while a too simple model has low variance but large bias, both leading a high error but two different reasons. As a result, two different ways to solve the problem come into people's mind (maybe Breiman and others), variance reduction for a complex model, or bias reduction for a simple model, which refers to random forest and boosting.
Random forest reduces variance of a large number of "complex" models with low bias. We can see the composition elements are not "weak" models but too complex models. If you read about the algorithm, the underlying trees are planted "somewhat" as large as "possible". The underlying trees are independent parallel models. And additional random variable selection is introduced into them to make them even more independent, which makes it perform better than ordinary bagging and entitle the name "random".
While boosting reduces bias of a large number of "small" models with low variance. They are "weak" models as you quoted. The underlying elements are somehow like a "chain" or "nested" iterative model about the bias of each level. So they are not independent parallel models but each model is built based on all the former small models by weighting. That is so-called "boosting" from one by one.
Breiman's papers and books discuss about trees, random forest and boosting quite a lot. It helps you to understand the principle behind the algorithm.
First I think it is hard to say one model out "perform" another. Each model has different pros and cons and should be applied to different cases. For example, I would not say random forest outperforms linear regression, because linear regression is 1. more "stable" 2. requires less computational power 3. more interpretable, plus, if you ground truth between feature and value is really linear, no one can beat linear regression.
Now, back to your question, on code to try two approaches.
You can easily to do the experiment with both way and compare the performance. The trick is using model.matrix in R. Here is one example from ISL book to use model.matrix to convert factors to design matrix and use ridge or lasso.
# Chapter 6 Lab 2 of ISL book: Ridge Regression and the Lasso
library(ISLR)
library(glmnet)
Hitters=na.omit(Hitters)
# transfer formula input to matrix input
x=model.matrix(Salary~.,Hitters)[,-1]
y=Hitters$Salary
set.seed(1)
train=sample(1:nrow(x), nrow(x)/2)
test=(-train)
y.test=y[test]
grid=10^seq(10,-2,length=100)
# The Lasso
lasso.mod=glmnet(x[train,],y[train],alpha=1,lambda=grid)
plot(lasso.mod)
set.seed(1)
cv.out=cv.glmnet(x[train,],y[train],alpha=1)
plot(cv.out)
# get best fit lamda and fit all data
bestlam=cv.out$lambda.min
lasso.pred=predict(lasso.mod,s=bestlam,newx=x[test,])
mean((lasso.pred-y.test)^2)
out=glmnet(x,y,alpha=1,lambda=grid)
On the other hand, you can easily do randomForest like
randomforest(Salary~.,data=Hitters)
Best Answer
This sounds somewhat like gradient tree boosting. The idea of boosting is to find the best linear combination of a class of models. If we fit a tree to the data, we are trying to find the tree that best explains the outcome variable. If we instead use boosting, we are trying to find the best linear combination of trees.
However, using boosting we are a little more efficient as we don't have a collection of random trees, but we try to build new trees that work on the examples we cannot predict well yet.
For more on this, I'd suggest reading chapter 10 of Elements of Statistical Learning: http://statweb.stanford.edu/~tibs/ElemStatLearn/
While this isn't a complete answer of your question, I hope it helps.