Remove outliers, then standardise. This way all the batches of your "good data" will be scaled consistently.
Downsampling, as in removing data points, seems rather sketchy. If required, you could always just do stratified sampling instead etc.
@Automated pipelines
You have too many outliers, remove them and you remove an important chunk of the dataset. Or, those outliers were really important, and suddenly the predictions are bad.
I find using automated pre-processing of features about as feasible as flying planes without a pilot. You can set up the pipeline, but if the output really matters, you will always have to check yourself.
The literature on imputation of missing values is quite large. Among the best writers on the topic are Don Rubin and Paul Allison. Rubin's published work began several decades ago with his book co-authored with Rod Little Statistical Analysis of Missing Data. They develop a theory about missing data which includes considerations of how the missing values were generated: are they missing at random (MAR), missing completely at random (MCAR) or some other structure of missingness. Relevant to this discussion is that MAR and MCAR readily lend themselves to imputation while the other structures do not. Paul Allison's Sage paper Missing Data has a review of the key developments in this literature, beginning with Rubin's framework, and is highly lucid in its exposition -- as is all of Allison's published work. Allison also develops approaches to evaluating the success or accuracy of the imputation process, which greatly aids in building confidence in the results. Familiarity with this literature is desirable as it will greatly inform your decisions way beyond what a few posts on CV can cover.
Rubin makes the very important point that not only is there a source structure to the missing values, there can be a patterning in missingness across variables and that one should leverage these patterns in the imputation process. For instance, you mention that you have a number of variables with a range of missing proportions. Rubin's recommendation is that you begin the imputation process by ranking the proportion of missing information from least to most and impute the "least" missing values variables first and, subsequently, to use those new, imputed variables to impute the "downstream" missing values.
Then there are considerations as to how -- what method -- you use in imputing missing information. There are many approaches ranging from mean or modal substitution, to so-called sorted "hot deck" imputation and regression models. Mean substitution simply involves finding an average value at an appropriate level of subsetting and plugging that average value in for the missing data. It has the disadvantage of creating large spikes in a distribution when many missing elements fall into that subset. "Hot deck" imputation simply involves sorting your data by a wide set of cross-classifying information, then plugging missing values with the closest non-missing, "real" value in the sort order. Of all of these, regression modeling is the least biasing and "lump-" producing as it develops a smooth prediction for the missing values.
Evaluating the accuracy of the process typically involves a comparison of the full set of new, imputed values with the original, incomplete set of information. To the extent that the new "marginals" are close to or recover the original "marginals" then your imputation process is considered a success and is, hopefully, unbiased.
Another consideration involves whether the missing values are in the target variable(s) or the features. As a general rule of thumb, imputing the target variable is not recommended as biasing and problematic whereas feature prediction is not problematic in the same way.
So far, this discussion assumes that you have made the decision to impute. There are important factors to consider before you make this decision. It is unfortunately true that imputation can generate a false sense of confidence in your new, imputed dataset since the new information remains, at best, little more than spam -- no matter how rigorous the method used. The point here is that even when your new marginals quite precisely recover the original marginals, this does not mean that at the observation level (your smallest unit of analysis) you have made the "correct" assignment or recovered the "true" value for that observation.
Let me illustrate this with a real-world example: in working with a survey of 3,000 respondents in a health care study, about 400 of them were identified members of our customer database of nearly 1 million. This enabled a one-to-one comparison of these 400 responsdents to our in-house customer information, for that small subset of items that matched between the two sources. So, while geographic region was a perfect match, and age was a very good match, income was a horrendous match. Note that we were comparing survey self-response to an income strata question from a set of purchased geo-demographic information vended by a large, well-known data vendor where household income had been imputed for our customers based on a model of some type. Note that the model did a great job of recovering the marginals for household income but at the level of the individual household, 80% of them were not in the same or even a similar quartile bucket! And after collapsing the 4 buckets down to two -- high and low hh income -- less than 50% of the values fell on the diagonal.
This should give you an idea of the (in)accuracy of imputation at the level of the unique observation for many items of great importance.
Best Answer
The answer is "it depends". You haven't told us the nature of your data, the nature of those outliers, or how you identify outliers as outliers. In some cases, those so-called outliers are not outliers at all. A better model would attribute them to some cause. An example: Alaskan North Slope climate change just outran one of our tools to measure it. In this case, automated outlier detection masked the climate data for Utqiaġvik, Alaska as missing "for all of 2017 and the last few months of 2016".
In other cases, there is no model other than the datum in question is bad (bad recording, bad transmission, ...), in which case editing it out may well be the best thing to do. Regardless of how robust a technique is, I've yet to see a technique that is robust against 60 sigma outliers. Given any reasonable distribution, you'll never, ever see a true 60 sigma outlier. Yet they do happen all the time. A high order bit can flip from a zero to a one due to noisy transmission, or a manually recorded piece of data can have a misplaced decimal or be expressed in the wrong units.