Adaboost can use multiple instances of the same classifier with different parameters. Thus, a previously linear classifier can be combined into nonlinear classifiers. Or, as the AdaBoost people like to put it, multiple weak learners can make one strong learner. A nice picture can be found here, on the bottom.
Basically, it goes as with any other learning algorithm: on some datasets it works, on some it doesn't. There sure are datasets out there, where it excels. And maybe you haven't chosen the right weak learner yet. Did you try logistic regression? Did you visualize how the decision boundaries evolve during adding of learners? Maybe you can tell what is going wrong.
"Feature selection", as you describe it, boils down to fitting a depth 1 decision tree on a set of data. Given such a set and weights associated with each, the problem of fitting a decision tree (a stump) involves finding the "best" variable $x$ and threshold $s$, where the best variable and split threshold are defined as the pair that minimizes some measure of node impurity, like the Gini index. If you are unfamiliar with the process of fitting a decision tree, there are lots of resources around the internet. Here is one example.
So, given a set of candidate variables to split on and a set of training data, there will be a unique solution (a single variable and threshold) that is the best depth 1 decision tree for the current boosting stage. In that sense, there is no randomness. However, the set of variables $X=\{x_1, x_2, \dots \}$ that we can pick our split variable from may be either the entire set of features we have, or it can be a (random) subset. Many implementations of decision tree classifiers will enable the fitting algorithm to randomly pick up a subset of variables at each branching phase. For example, in sklearn's DecisionTreeClassifier, this is governed by the max_features parameter.
It may indeed be the case that a variable appears in more than 1 stump during the AdaBoost procedure- nothing precludes that from happening.
The reason that we weight the mis-classified points more is that those are the ones we want to correct. A particular variable may not be the best variable to split on given equally weighted data, but it may become the best variable to split on once the weights become unevenly distributed. It could be good at correcting the mistakes of a previous learner.
To answer your final question, it isn't guaranteed that each of your 20 features will appear twice in the final result. It could be that several features are repeated, and potentially some are ignored altogether. We can make even fewer gaurantees if the subset of features we can split on at each step is a random subset of the full 40 features.
Finally, I'll point out that the AdaBoost is a meta-estimator that can use any kind of weak learner as the base classifier. The decision tree is a popular choice, but not necessarily the only one. You could use logistic regressions (less common), or even neural networks.
Best Answer
The answer to your first question is yes. In classical adaboost, if a newly added weak learner (e.g. ClassifierB) does not reduce the overall empirical classification error, the algorithm stops. So, the linear combination of weak learners should do at least as well as the best weak learner.
Your second question: adaboost is an additive (and sequential) model. That is, the choice of ClassifierB depends on the performance of the previously selected weak learners. If ClassifierA performs badly on specific data points, then Adaboost increases their weights so that the next weak learner (ClassifierB) tries to handle them more than the other data points.