First question
When you consider the prediction error through bias-variance decomposition you face the well-know compromise between how well you fit on average (bias) and how stable is you prediction model (variance).
One can improve the error by decreasing the variance or by increasing the bias. Either way you choose, you might loose at the other hand. However, one can hope that overall you improve your prediction error.
Random Forests use trees which usually overfit. The trees used in RF are fully-grown (no pruning). The bias is small since they capture locally the whole details. However fully-grown trees have few prediction power, due to high variance. Variance is high because the trees depends too much on any piece of data. One observation added/removed can give a totally different tree.
Bagging (many averaged trees on bootstrap samples) have the nice result that it creates smoother surfaces (the RF is still a piece-wise model, but the regions produced by averaging are much smaller). The surfaces resulted from averaging are smoother, more stable and thus have less variance. One can link this averaging with Central Limit Theorem where one can note how variance decrease.
A final note on RF is that the averaging works well when the trees resulted are independent. And Breiman et all. realized that by choosing randomly some of the input features will increase the independence between the trees, and have the nice effect that the variance decrease more.
Boosting works the other way, regarding bias-variance decomposition. Boosting starts with a weak-model which have low variance and high bias. The same trees used in boosting are not grown to many nodes, only to a small limit. Sometimes works fine even with decision stumps, which are the simplest form of a tree.
So, boosting starts with a low-variance tree with high bias (learners are weak because they do not capture well the structure of the data) and works gradually to improve bias. MART (I usually prefer the modern name of Gradient Boosting Trees) improve bias by "walking" on the error surface in the direction given by gradients. They usually loose in variance, on this way, but, again, one can hope that the loose in variance is well-compensated by gain in bias term.
As a conclusion these two families (bagging and boosting) are not at all the same. They start their search for the best compromise from opposite ends and works in opposite directions.
Second question
Random Forests uses CART trees, if you want to do it by the book. However were reported very small differences if one uses different trees (like C4.5) with various splitting criteria. As a matter of fact the Weka's implementation of RF uses C4.5 (the name is J4.8 in Weka library). The reason is probably the computation time, caused by smaller trees produced.
Best Answer
It probably depends on what you want to use variable importances for. Is it to be used as a criterion for feature selection for a third classification model? In that case you could try to compute a weighted average the variable importances (maybe after normalizing each individual variable importance vector to unit length) for various values and the averaging weight and then pickup the value that yields the best cross-validated score for the final model.
As for combining the outcome of the logistic regression model and the random forest model (without considering variable importances), the following blog post is very informative and demonstrates that a single averaging of the output is a simple yet very effective ensemble method for regression models.