The purpose of stratified cross validation is to ensure that each fold has a class distribution similar to the data set as a whole. Your proposed approach doesn't do anything to maintain that distribution. If you have a very small class, you might get a fold that has no records with that class as the outcome.
To construct folds that maintain the class distribution, first break up your data set into homogeneous subsets by class level. For example, if you have a binary classification problem with labels zero and one, you'll get two subsets, one with all records with label zero and one with all records with label one. Then for each subset, run your partition algorithm (assigning a random number from uniform distribution then assigning row to fold based on whether it's <= .5 or not), to break each subset up into two equal-sized folds. Now you can make the data sets for your cross validation by combining the class-specific folds so that each cross-validation set has one fold of each class level.
Stratification seeks to ensure that each fold is representative of all strata of the data. Generally this is done in a supervised way for classification and aims to ensure each class is (approximately) equally represented across each test fold (which are of course combined in a complementary way to form training folds).
The intuition behind this relates to the bias of most classification algorithms. They tend to weight each instance equally which means overrepresented classes get too much weight (e.g. optimizing F-measure, Accuracy or a complementary form of error). Stratification is not so important for an algorithm that weights each class equally (e.g. optimizing Kappa, Informedness or ROC AUC) or according to a cost matrix (e.g. that is giving a value to each class correctly weighted and/or a cost to each way of misclassifying). See, e.g.
D. M. W. Powers (2014), What the F-measure doesn't measure: Features, Flaws, Fallacies and Fixes. http://arxiv.org/pdf/1503.06410
One specific issue that is important across even unbiased or balanced algorithms, is that they tend not to be able to learn or test a class that isn't represented at all in a fold, and furthermore even the case where only one of a class is represented in a fold doesn't allow generalization to performed resp. evaluated. However even this consideration isn't universal and for example doesn't apply so much to one-class learning, which tries to determine what is normal for an individual class, and effectively identifies outliers as being a different class, given that cross-validation is about determining statistics not generating a specific classifier.
On the other hand, supervised stratification compromises the technical purity of the evaluation as the labels of the test data shouldn't affect training, but in stratification are used in the selection of the training instances. Unsupervised stratification is also possible based on spreading similar data around looking only at the attributes of the data, not the true class. See, e.g.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.469.8855
N. A. Diamantidis, D. Karlis, E. A. Giakoumakis (1997),
Unsupervised stratification of cross-validation for accuracy estimation.
Stratification can also be applied to regression rather than classification, in which case like the unsupervised stratification, similarity rather than identity is used, but the supervised version uses the known true function value.
Further complications are rare classes and multilabel classification, where classifications are being done on multiple (independent) dimensions. Here tuples of the true labels across all dimensions can be treated as classes for the purpose of cross-validation. However, not all combinations necessarily occur, and some combinations may be rare. Rare classes and rare combinations are a problem in that a class/combination that occurs at least once but less than K times (in K-CV) cannot be represented in all test folds. In such cases, one could instead consider a form of stratified boostrapping (sampling with replacement to generate a full size training fold with repetitions expected and 36.8% expected unselected for testing, with one instance of each class selected initially without replacement for the test fold).
Another approach to multilabel stratification is to try to stratify or bootstrap each class dimension separately without seeking to ensure representative selection of combinations. With L labels and N instances and Kkl instances of class k for label l, we can randomly choose (without replacement) from the corresponding set of labeled instances Dkl approximately N/LKkl instances. This does not ensure optimal balance but rather seeks balance heuristically. This can be improved by barring selection of labels at or over quota unless there is no choice (as some combinations do not occur or are rare). Problems tend to mean either that there is too little data or that the dimensions are not independent.
Best Answer
I'm not aware of any approaches that got to having their own name (other than that stratification is not per se restricted to classification).I don't have the paper, but according to the abstract it is an implementation of the strategy I outline below in the first bullet point (extending Kennard-Stone -> Duplex -> cross validation).
That being said, the building blocks are around, so let's design a cross validation experiment:
Venetian Blinds Cross Validation assigns consecutive samples to consecutive folds: $fold = case~number \mod k$.
If we sort cases* according to $y$ first, venetian blinds gets us close to stratified folds. This corresponds to assigning $fold = rank (y) \mod k$
This approach has an inbuilt small but systematic difference between the folds as the difference between any two corresponding case in two folds will always have the same sign.
We can improve our stratification by formulating the cross validation as randomized blocked experiment:
Somewhat related are techniques that sample cases from $\mathbf X$ in order to get uniform coverage in $\mathbf X$ (so input space rather than output space). This is particularly relevant where $\mathbf X$ is available for a large sample size but obtaining reference $y$ is costly and thus reference cases should be carefully selected*.
* This is a common situation e.g. in chemical analysis when calibrating spectroscopic data: spectra $\mathbf X$ can often obained in (semi)automated fashion, so lots of cases are measured spectroscopically. However, reference analyses $y$ are often expensive, so the task is to select a subset of $n$ (say, 100) cases that are sent for reference analysis from the much larger set of measured spectra $\mathbf X$. The regression model is then either trained in a supervised fashion from that subset of $\mathbf X$ and the corresponding $y$ or in a semi-supervised fashion from the whole $\mathbf X$ and the smaller $y$.