First write the statement mathematically: define $\mathcal{F}$ as a function space, $\hat{f}_{n,\mathcal{F}} = \arg\min_{\hat{f}\in\mathcal{F}}\sum_{i=1}^n (y_i - \hat{f}(x_i))^2$ as the optimal regression in $\mathcal{F}$, $Bias^2(\hat{f}_{n,\mathcal{F}}(x_0)) = [E(\hat{f}_{n,\mathcal{F}}(x_0)) - f(x_0)]^2$ and $Variance(\hat{f}_{n,\mathcal{F}}(x_0)) = Var[\hat{f}_{n,\mathcal{F}}(x_0)]$ as you defined, where the expectation and variance are taken on the training data.
You asked that whether a more complex model must have lower bias but greater variance, which can be written as the statement: if $\mathcal{F_1} \subset \mathcal{F_2}$, $Bias^2(\hat{f}_{n, \mathcal{F}_1}(x_0)) \ge Bias^2(\hat{f}_{n, \mathcal{F}_2}(x_0))$ and $Variance(\hat{f}_{n, \mathcal{F}_1}(x_0)) \le Variance^2(\hat{f}_{n, \mathcal{F}_2}(x_0))$.
I can find a counterexample as following: assume the true $f(x) = 1$ with $\sigma_\epsilon = 0$, and consider $\mathcal{F_1} = \{ax\}$, $\mathcal{F_2} = \{ax+b\}$, number of training data $n = 2$. It can be computed that $\hat{f}_{2, \mathcal{F}_1}(x_0) = \frac{x_1 + x_2}{x_1^2 + x_2^2}x_0$, $\hat{f}_{2, \mathcal{F}_2}(x_0) = 1$. The second has zero bias and variance, which are both lower than the first.
It shows that a more complex model may have both lower bias and variance.
They are both essentially saying the same thing. Your book takes a more bayesian way of approaching the tradeoff, while the other is a more statistical way of viewing it. There is also the Machine Learning scope which refers to them as the overfitting and underfitting.
I'm guessing you're looking for something like this post which ties the two together.
If you're looking at understanding what the tradeoff essentially is the second link you posted doesn't offer a bad explanation. I'd also like to offer this notebook which presents a simple graphical example of the tradeoff so that you can see it in practice
Best Answer
It results as a decomposition of the error function in two terms, representing "two opposing forces", in the sense that in order to reduce the bias error, you need your model to consider more possibilities to fit the data. But this on the other side increases the variance error. Also, the other way around: if your model fits too much (starts to fit noise, which you could see as non-systematic variations on your individual samples), then you need to force your parameters not to vary too wildly, and thus introducing bias.
In more intuitive terms: bias error is being systematically wrong, and variance error is about learning all tiny, accidental variations of the samples.
Take a look at this nice article for details, http://scott.fortmann-roe.com/docs/BiasVariance.html