A few comments:
The option (1) is a very bad idea. Copies of the same point may end up in both the training and test sets. This allows the classifier to cheat, because when trying to make predictions on the test set the classifier will already have seen identical points in the train set. The whole point of having a test set and a train set is that the test set should be independent of the train set.
The option (2) is honest. If you don't have enough data, you could try using $k$-fold cross validation. For example, you could divide your data into 10 folds. Then, for each fold individually, use that fold as the test set and the remaining 9 folds as a train set. You can then average training accuracy over the 10 runs. The point of this method is that since only 1/10 of your data is in the test set, it is unlikely that all your minority class samples end up in the test set.
Original Answer: There is no benefit to under- or over-sampling either group
New Conclusion, see edit at end: There can be a benefit of sampling, but not random sampling of failures and non-failures.
You are forgetting the point of a Cox analysis. It is not to analyze who dies and who does not die; rather it is to study effects of covariates on the timing of death. A Cox analysis would be valid even if everyone died. If $Y$ is a failure time and $t$ is the time scale of the study, Cox studies the hazard rate, defined as:
$$
\lambda (t) = \lim_{\Delta t \to 0} \frac{ Pr ( t \leq Y < t + \Delta t \thinspace| \thinspace Y \ge t)}{\Delta t}
$$
The fundamental unit is not the individual but the risk set at each failure time, those individuals who have not yet failed prior to that time, i.e. who have $Y \ge t$. Thus at the first failure, the risk set consists of the failure and all the others in the sample. The Cox partial likelihood compares covariate values for the failure and the others in the risk set. So an imbalance is built into the Cox analysis, but it is not a shortage of non-failures, just a reverse: the one (or few) failures at each distinct failure time $t*$ is compared to all those who have not yet failed by that time.
In some, perhaps most, studies, there is a shortage of failures, with most individuals still alive at the end of follow-up. In such studies, it pays to sample non-failures at each risk set. This is called "risk-set sampling"; see Langholtz et al, 1996. You don't have this problem.
For two reasons there might be a benefit to sampling, but not to under-or over-sampling each group. The first reason is economic, that studying all individuals is too costly to do. This could happen, for example, if you need to examine physical records for each subject. A second reason for sampling arises from your very number large of failures If you split your sample, you can explore and develop models on one part and validate the models on the second part.
Reference
Langholz, Bryan, and Larry Goldstein. 1996. Risk set sampling in epidemiologic cohort studies. Statistical Science 35-53. Available at: http://projecteuclid.org/euclid.ss/1032209663
Edit December 8:
In the original version, I stated that there were two reasons in which sampling might be beneficial: to reduce economic burden and as part of a modeling strategy. I would add computational burden as a reason, a likely one with a sample as large as yours.
I didn't say how you might do the sampling. I would suggest a two-stage procedure: 1) sample failures; and 2) sample risk sets for those failures. Therefore the Langholz reference may well be relevant to your study.
In Stata, the command sttocc can perform risk-set sampling. Apparently the Epi package in R can also do so.
As part of a modeling strategy, you might stratify the entire sample into natural groups, such as gender. You can fit separate models to each or treat them as strata for the Cox analysis. In either case, the risk sets will be smaller, and the computational burden will be lessened.
Added-off topic: Even with exact dates for entry and exit, you are likely to have heavily tied observation times. In that case, you might consider discrete or grouped data models. See, e.g.
Jenkins, S. P. (1995). Easy estimation methods for discrete-time duration models. Oxford Bulletin of Economics and Statistics 57(1): 129-138.
Prentice, R. and L. Gloeckler. (1978). Regression analysis of grouped survival data with application to breast cancer data Biometrics 34: 57-67.
There are Stata specific examples at https://www.iser.essex.ac.uk/resources/survival-analysis-with-stata
Best Answer
It doesn't work just to divide the probabilities. Basically you have to adjust the odds, not the probabilities.
There's a nice description and some sample calculations here: https://yiminwu.wordpress.com/2013/12/03/how-to-undo-oversampling-explained/
(added in edit) There's a different derivation that gives the same results here:
http://blog.data-miners.com/2009/09/adjusting-for-oversampling.html
That blog post is a bit simpler to understand.
I'm not a SMOTE user, and can't comment on the particular applicability to SMOTE.