Yes, this is simply to ensure that all of the weak learners in the final solution classify the training data better than chance (which is essential for the theoretical properties of AdaBoost to hold).
So, boosting is a learning algorithm, which can generate high-accuracy predictions using as a subroutine another algorithm, which in turn can efficiently generate hypotheses just slightly better (by an inverse polynomial) than random guessing.
It's main advantage is speed.
When Schapire presented it in 1990 it was a breakthrough in that it showed that a polynomial time learner generating hypotheses with errors just slightly smaller than 1/2 can be transformed into a polynomial time learner generating hypotheses with an arbitrarily small error.
So, the theory to back up your question is in "The strength of weak learnability" (pdf) where he basically showed that the "strong" and "weak" learning are equivalent.
And perhaps the answer the the original question is, "there's no point constructing strong learners when you can construct weak ones more cheaply".
From the relatively recent papers, there's "On the equivalence of weak learnability and linear separability: new relaxations and efficient boosting algorithms" (pdf) which I don't understand but which seems related and may be of interest to more educated people :)
Best Answer
It is not required to have a "weak classifier" for base classifier, in theory you can even choose neural network as base classifier. However, weak classifier is recommended to avoid overfitting and better control of the model complexity.
The reason is that boosting will increase the model "variance" step by step over iterations. If the base classifier has high variance (say neural network), we may jump to far from one step to another step over iterations. The result would be it is very hard to control the model complexity. In the end, we may have a overfitting problem.
On the other hand, if we chose a simple / trivial base, such as decision stump, we can use number of iterations to control the model complexity easily, and if we increase number of iterations, we still will have a sufficient complex model.
I Would suggest you to read my other question and answers, there are many visualizations in there.
Is a decision stump a linear model?
How does linear base leaner works in boosting? And how it works in xgboost library?
Boosting: why is the learning rate called a regularization parameter?