I think you misunderstand the way folds are generated in cross-validation. In cross-validation, your data set is partitioned at random into a specific number of folds.* The data is not partitioned as you would slice a pie (e.g. adjacent instances belonging to the same fold).
*: in stratified cross-validation the class balance in the overall data set is maintained across folds.
It doesn't appear to be overfitting. Intuitively, overfitting implies training to the quirks (noise) of the training set and therefore doing worse on a held-out test set which does not share these quirks. If I understand what happened, they did not do unexpectedly-poorly on held-out test data and so that empirically rules out overfitting. (They have another issue, which I'll mention at the end, but it's not overfitting.)
So you are correct that it takes advantage of the available (30%?) test data. The question is: how?
If the available test data has labels associated with it, you could simply lump it into your training data and enlarge your training data, which in general would yield better results in an obvious way. No real accomplishment there.
Note that the labels wouldn't have to be explicitly listed if you have access to an accuracy score. You could simply climb the accuracy gradient by repeatedly submitting scores, which is what people have done in the past with poorly-designed competitions.
Given that the available test data does not have labels associated with it -- directly or indirectly -- there are at least two other possibilities:
First, this could be an indirect boosting method where you're focusing on cases where your predictions with only the training data disagree with your predictions with the pseudo-labeled test data included.
Second, it could be straightforward semi-supervised learning. Intuitively: you could be using the density of unlabeled data to help shape the classification boundaries of a supervised method. See the illustration (https://en.wikipedia.org/wiki/Semi-supervised_learning#/media/File:Example_of_unlabeled_data_in_semisupervised_learning.png) in the Wikipedia definition of semi-supervised learning to clarify.
BUT this doesn't mean that there isn't a trick here. And that trick comes from the definition of training and test data. In principle, training data represents data that you could have in hand when you are ready to deploy your model. And test data represents future data that will come into your system once it's operational.
In that case, training on test data is a leak from the future, where you are taking advantage of data you would not have seen yet. This is a major issue in the real world, where some variables may not exist until after the fact (say after an investigation is done) or may be updated at a later date.
So they are meta-gaming here: what they did is legitimate within the rules of the competition, because they were given access to some of the test data. But it's not legitimate in the real world, where the true test is how well it does in the future, on new data.
Best Answer
This warning means that the iterative routine used by LIBSVM to solve quadratic optimization problem in order to find the maximum margin hyperplane (i.e., parameters $w$ and $b$) separating your data reached the maximum number of iterations and will have to stop, while the current approximation for $w$ can be further enhanced (i.e., $w$ can be changed to make the value of the objective function more extreme). In short, that means the LIBSVM thinks it failed to find the maximum margin hyperplane, which may or may not be true.
There are many reasons why this may happen, I'd suggest you to do the following:
It's a good idea to search optimal $C$ on a logarithmic scale, like you do. I think for normalized data the search range for $C$ that you suggested should be OK. A useful check: the accuracy of the classifier should not be changing much on the borders of that range and between two values of your set. If it does, extend the range or add intermediate values.
Note that LIBSVM distributive for Windows should contain a Python script called grid.py, which can do parameter selection for you (based on cross validation and specified search ranges). It can also produce contour plots for the accuracy of SVM. This tool may be quite helpful.
The following question on StackOverflow and its related questions might also help: libsvm Shrinking Heuristics