MATLAB: Understanding and applying results of bayesopt

I have some difficulties understanding the Matlab documentation of the bayesopt function.
For example, the bestPoint function offers a couple of "best points" of a Bayesian optimization result. Which one should be used in order to get the best out-of-sample predictive accuracy?
Let's say I let bayesopt find the "best" hyperparameters for a regression tree ensemble (by actually using fitrensemble directly instead of the bayesopt function) and obtain the following result graphs:
What do both graphs (if at all) tell about the "best point", convergence, predictive accuracy etc. (generally, but also considering especially this example)? Are there any sources that explain these concepts, at least at a higher level, so that I can better make use of bayesopt?

It looks like no new minima are being found, and that the model of the objective function is stabilizing, but it's not a good model. The model has minima that are negative. A negative value for log(1+Loss) implies that Loss<0, which is impossible for MSE loss.
I've seen this happen when there is a steep "cliff" in the objective function (over hyperparameter space). The Gaussian Process model of that function smooths out the cliff and thereby undershoots the true function (and zero) at the base of the cliff. In fact, the reason that the objective function when optimizing regression fit functions is defined as log(1+Loss) instead of Loss, is to try to reduce the size of such cliffs to reduce the chance of overshoots like this.
To diagnose this, you could look at the values of the objective function that are being found, to see if they differ by orders of magnitude.
Regarding bestPoint, since the model is not giving a resonable estimate of the minimum of the objective function, it would probably be better to trust the minimum observed point, and use the 'min-observed' criterion.