Something like that would be my starting assumption, and for many practical examples you would be unlucky, if it turned out to be very wrong. But...
Noise: The more noise, the more conservative predictions(regression towards the mean) the RF will yield. This will introduce a bias, generally reducing the amplitude/steapness of a given partial plot. This should be regarded as a feature, not a bug. Thus the upper flatness, can also be due to few samples and more noise.
Interactions: Partial plotting of the higher dimensional topology of the trained RF model, is suitable only, when there is no dominant interactions with this specific variable. In the extreme case a variable can be highly important, but have a near flat partial function or you could end up with a Simpsons Paradox http://en.wikipedia.org/wiki/Simpson%27s_paradox.
Sample density: Alternatively you could more crudely say overall that y = a log(x) + b . I would recommend to plot an overlay of the training samples. Otherwise it is hard to assess weather a given local 'blop' is most likely due to few samples and some noise or it is actually a sound trend, which deserves to be described in detail.
Did the model use the specific variable much?: If the variable importance of this variable is very low, that would often mean that this variable have not been used much in the trees of the forest. Therefore the reproducibility of the partial function could become more unstable and the pratial function could become more crude. This could happen for noisy environments, sparse environments. It helps a little to lower mtry, such that less superior variables are used more.
Lastly a link to similar question I answered with some code examples for R randomForest:
R: What do I see in partial dependence plots of gbm and RandomForest?
In addition to @mariodeng's answer which explains why the random forest trained with default parameters is worse here, here's an explanation why it may not be better than single trees in your experiment anyways:
Aggregated/ensemble models are not universally better than their "single" counterparts, they are better if and only if the single models suffer of instability.
With 1000 training rows and only 3 columns, you are in a comfortable training sample size situation in which even a decision tree may get reasonably stable.
(For 3d data you can easily check the variation you have in the assignment of input space to the classes when rerunning the experiment.)
If the predictions of the trees are stable, all submodels in the ensemble return the same prediction and then the prediction of the random forest is just the same as the prediction of each single tree.
So then not only will the overall performance be the same, it will be the same cases that are predicted correctly and wrongly, respectively.
This is the case in your example:
table (predict(dtFit, test) [, 2], predict (rfFit, test))
# 0 1
# 0 46 0
# 1 0 54
why not 100% accurate?
You train on data that is not representative for the test cases: the test cases cover regions of the input space that never appear in the training data. There is no way for a model to know which class (if any - or maybe a 3rd? ...) cases far outside training space should belong to.
Particularly for highly nonlinear partitioning models (such as the decision trees), leaving training space will typically rather sooner than later lead to disaster.
If you plan to train on one class only, you need to look into so-called one-class classifiers which try to establish independent boundaries for each class. One-class classification of your toy data should give you the result that the out-of-training-space cases do not belong to any of the known classes.
Decision trees are a partitioning method, they cannot do one-class classification.
Best Answer
There are many packages that implement randomForest.
Party
is one of them that supports plottingFirst build a forest:
Then extract a tree and build a binary tree that can be plotted: