In the frequentist approach, it is asserted that the only sense in which probabilities have meaning is as the limiting value of the number of successes in a sequence of trials, i.e. as
$$p = \lim_{n\to\infty} \frac{k}{n}$$
where $k$ is the number of successes and $n$ is the number of trials. In particular, it doesn't make any sense to associate a probability distribution with a parameter.
For example, consider samples $X_1, \dots, X_n$ from the Bernoulli distribution with parameter $p$ (i.e. they have value 1 with probability $p$ and 0 with probability $1-p$). We can define the sample success rate to be
$$\hat{p} = \frac{X_1+\cdots +X_n}{n}$$
and talk about the distribution of $\hat{p}$ conditional on the value of $p$, but it doesn't make sense to invert the question and start talking about the probability distribution of $p$ conditional on the observed value of $\hat{p}$. In particular, this means that when we compute a confidence interval, we interpret the ends of the confidence interval as random variables, and we talk about "the probability that the interval includes the true parameter", rather than "the probability that the parameter is inside the confidence interval".
In the Bayesian approach, we interpret probability distributions as quantifying our uncertainty about the world. In particular, this means that we can now meaningfully talk about probability distributions of parameters, since even though the parameter is fixed, our knowledge of its true value may be limited. In the example above, we can invert the probability distribution $f(\hat{p}\mid p)$ using Bayes' law, to give
$$\overbrace{f(p\mid \hat{p})}^\text{posterior} = \underbrace{\frac{f(\hat{p}\mid p)}{f(\hat{p})}}_\text{likelihood ratio} \overbrace{f(p)}^\text{prior}$$
The snag is that we have to introduce the prior distribution into our analysis - this reflects our belief about the value of $p$ before seeing the actual values of the $X_i$. The role of the prior is often criticised in the frequentist approach, as it is argued that it introduces subjectivity into the otherwise austere and object world of probability.
In the Bayesian approach one no longer talks of confidence intervals, but instead of credible intervals, which have a more natural interpretation - given a 95% credible interval, we can assign a 95% probability that the parameter is inside the interval.
This example is taken from here. (I even think I got this link from SO, but cannot find it anymore.)
A coin has been tossed $n=14$ times, coming up heads $k=10$ times. If it is to be tossed twice more, would you bet on two heads? Assume you do not get to see the result of the first toss before the second toss (and also independently conditional on $\theta$), so that you cannot update your opinion on $\theta$ in between the two throws.
By independence, $$f(y_{f,1}=\text{heads},y_{f,2}=\text{heads}|\theta)=f(y_{f,1}=\text{heads})f(y_{f,2}=\text{heads}|\theta)=\theta^2.$$
Then, the predictive distribution given a $\text{Beta}(\alpha_0,\beta_0)$-prior, becomes
\begin{eqnarray*}
f(y_{f,1}=\text{heads},y_{f,2}=\text{heads}|y)&=&\int f(y_{f,1}=\text{heads},y_{f,2}=\text{heads}|\theta)\pi(\theta|y)d\theta\notag\\
&=&\frac{\Gamma\left(\alpha _{0}+\beta_{0}+n\right)}{\Gamma\left(\alpha_{0}+k\right)\Gamma\left(\beta_{0}+n-k\right)}\int \theta^2\theta ^{\alpha _{0}+k-1}\left( 1-\theta \right) ^{\beta _{0}+n-k-1}d\theta\notag\\
&=&\frac{\Gamma\left(\alpha_{0}+\beta_{0}+n\right)}{\Gamma\left(\alpha_{0}+k\right)\Gamma\left(\beta_{0}+n-k\right)}\frac{\Gamma\left(\alpha_{0}+k+2\right)\Gamma\left(\beta_{0}+n-k\right)}{\Gamma\left(\alpha_{0}+\beta_{0}+n+2\right)}\notag\\
&=&\frac{(\alpha_{0}+k)\cdot(\alpha_{0}+k+1)}{(\alpha_{0}+\beta_{0}+n)\cdot(\alpha_{0}+\beta_{0}+n+1)}
\end{eqnarray*}
For a uniform prior (a $\text{Beta}(1, 1)$-prior), this gives roughly .485. Hence, you would likely not bet. Based on the MLE 10/14, you would calculate a probability of two heads of $(10/14)^2\approx.51$, such that betting would make sense.
Best Answer
Once you've fitted the model, it will be what it will be, so I think the difference is prior to that. That is, the models / parameters are fitted differently between the Bayesian and Frequentist approaches. More specifically, the fitted Bayesian parameters will incorporate additional information outside of what is in the data. If you know something about what the parameters are likely to be (and you aren't wrong), that could boost the model's performance. Even if you use an 'uninformative' prior, you will typically find the fitted Bayesian parameters will be shrunk to some degree towards $0$ relative to the fitted Frequentist parameters.