In the frequentist approach, it is asserted that the only sense in which probabilities have meaning is as the limiting value of the number of successes in a sequence of trials, i.e. as
$$p = \lim_{n\to\infty} \frac{k}{n}$$
where $k$ is the number of successes and $n$ is the number of trials. In particular, it doesn't make any sense to associate a probability distribution with a parameter.
For example, consider samples $X_1, \dots, X_n$ from the Bernoulli distribution with parameter $p$ (i.e. they have value 1 with probability $p$ and 0 with probability $1-p$). We can define the sample success rate to be
$$\hat{p} = \frac{X_1+\cdots +X_n}{n}$$
and talk about the distribution of $\hat{p}$ conditional on the value of $p$, but it doesn't make sense to invert the question and start talking about the probability distribution of $p$ conditional on the observed value of $\hat{p}$. In particular, this means that when we compute a confidence interval, we interpret the ends of the confidence interval as random variables, and we talk about "the probability that the interval includes the true parameter", rather than "the probability that the parameter is inside the confidence interval".
In the Bayesian approach, we interpret probability distributions as quantifying our uncertainty about the world. In particular, this means that we can now meaningfully talk about probability distributions of parameters, since even though the parameter is fixed, our knowledge of its true value may be limited. In the example above, we can invert the probability distribution $f(\hat{p}\mid p)$ using Bayes' law, to give
$$\overbrace{f(p\mid \hat{p})}^\text{posterior} = \underbrace{\frac{f(\hat{p}\mid p)}{f(\hat{p})}}_\text{likelihood ratio} \overbrace{f(p)}^\text{prior}$$
The snag is that we have to introduce the prior distribution into our analysis - this reflects our belief about the value of $p$ before seeing the actual values of the $X_i$. The role of the prior is often criticised in the frequentist approach, as it is argued that it introduces subjectivity into the otherwise austere and object world of probability.
In the Bayesian approach one no longer talks of confidence intervals, but instead of credible intervals, which have a more natural interpretation - given a 95% credible interval, we can assign a 95% probability that the parameter is inside the interval.
This example is taken from here. (I even think I got this link from SO, but cannot find it anymore.)
A coin has been tossed $n=14$ times, coming up heads $k=10$ times. If it is to be tossed twice more, would you bet on two heads? Assume you do not get to see the result of the first toss before the second toss (and also independently conditional on $\theta$), so that you cannot update your opinion on $\theta$ in between the two throws.
By independence, $$f(y_{f,1}=\text{heads},y_{f,2}=\text{heads}|\theta)=f(y_{f,1}=\text{heads})f(y_{f,2}=\text{heads}|\theta)=\theta^2.$$
Then, the predictive distribution given a $\text{Beta}(\alpha_0,\beta_0)$-prior, becomes
\begin{eqnarray*}
f(y_{f,1}=\text{heads},y_{f,2}=\text{heads}|y)&=&\int f(y_{f,1}=\text{heads},y_{f,2}=\text{heads}|\theta)\pi(\theta|y)d\theta\notag\\
&=&\frac{\Gamma\left(\alpha _{0}+\beta_{0}+n\right)}{\Gamma\left(\alpha_{0}+k\right)\Gamma\left(\beta_{0}+n-k\right)}\int \theta^2\theta ^{\alpha _{0}+k-1}\left( 1-\theta \right) ^{\beta _{0}+n-k-1}d\theta\notag\\
&=&\frac{\Gamma\left(\alpha_{0}+\beta_{0}+n\right)}{\Gamma\left(\alpha_{0}+k\right)\Gamma\left(\beta_{0}+n-k\right)}\frac{\Gamma\left(\alpha_{0}+k+2\right)\Gamma\left(\beta_{0}+n-k\right)}{\Gamma\left(\alpha_{0}+\beta_{0}+n+2\right)}\notag\\
&=&\frac{(\alpha_{0}+k)\cdot(\alpha_{0}+k+1)}{(\alpha_{0}+\beta_{0}+n)\cdot(\alpha_{0}+\beta_{0}+n+1)}
\end{eqnarray*}
For a uniform prior (a $\text{Beta}(1, 1)$-prior), this gives roughly .485. Hence, you would likely not bet. Based on the MLE 10/14, you would calculate a probability of two heads of $(10/14)^2\approx.51$, such that betting would make sense.
Best Answer
Suppose there is a model for the data $Y$ that depends on a parameter $\theta$ and, for a particular experiment, there is a true value of the parameter, $\theta_0$. You develop an estimator $\hat\theta = \hat\theta(Y)$, i.e. the estimator is a function of the data $Y$. Then the bias is $$ bias(\hat\theta) = E_{Y|\theta_0}[\hat\theta(Y) - \theta_0] $$ where the expectation is taken with respect to the randomness of the data $Y$ for the given true value of the parameter $\theta_0$ (and the subscript on the expectation attempts to make this explicit). As we are talking about an expectation over possible realizations of data, this is a frequentist concept.
In the description above, I have not mentioned how the estimator arises. This estimator could be a method of moments, maximum likelihood, Bayes, or something else estimator. Thus, the concept of bias of an estimator is frequentist, but the estimator itself could arise from a Bayesian analysis.