Feature selection sometimes improves the performance of regularized models, but in my experience it generally makes generalization performance worse. The reason for this is that the more choices we make regarding our model (including the values of the parameters, the choice of features, the setting of hyper-parameters, the choice of kernel...), the more data we need to make these choices reliably. Generally we make these choices by minimizing some criterion evaluated over a finite set of data, which means that the criterion inevitably has a non-zero variance. As a result, if we minimize the criterion too agressively, we can over-fit it, i.e. we can make choices that minimize the criterion because of features that depend on the particular sample on which it is evaluated, rather than because it will produce a genuine improvement in performance. I call this "over-fitting in model selection" to differentiate it from the more familiar form of over-fitting resulting from tuning the model parameters.
Now the SVM is an approximate implementation of a bound on generalization performance that does not depend on the dimensionality, so in principle, we can expect good performance without feature selection, provided the regularization parameters are correctly chosen. Most feature selection methods have no such performance "guarantees".
For L1 methods, I certainly wouldn't bother with feature selection, as the L1 criterion is generally effective in trimming features. The reason that it is effective is that it induces an ordering in which features enter and leave the model, which reduces the number of available choices in selecting features, and hence is less prone to over-fitting.
The best reason for feature selection is to find out which features are relevant/important. The worst reason for feature selection is to improve performance, for regularised models, generally it makes things worse. However, for some datasets, it can make a big difference, so the best thing to do is to try it and use a robust, unbiased performance evaluation scheme (e.g. nested cross-validation) to find out whether yours is one of those datasets.
Almost any approach that does some form of model selection and then does further analyses as if no model selection had previously happened typically has poor properties. Unless there are compelling theoretical arguments backed up by evidence from e.g. extensive simulation studies for realistic sample sizes and feature versus sample size ratios to show that this is an exception, it is likely that such an approach will have unsatisfactory properties. I am not aware of any such positive evidence for this approach, but perhaps someone else is. Given that there are reasonable alternatives that achieve all desired goals (e.g. the elastic net), it this approach is hard to justify using such a suspect ad-hoc approach instead.
Best Answer
Univariate feature selection is generally a poor method.
This question is deftly answered by silverfish in the context of correlation, but all his arguments apply to your case as well. In short, there is no reason to believe that univariately checking how each individual variable $x$ is related to your response $y$ reveals anything about the multivariate nature of the relationship between $X$ and $y$. It's quite possible that you end up screening out many of your good predictors.
As you point out, LASSO, ridge, or glmnet are much preferred methods for feature selection in a multiple regression model, as they:
You should carefully and respectfully start pointing your team towards a more modern and disciplined approach.
(*) You also don't mention if your team is testing for non-linear relationships when fitting these univariate models. At the very least, these univariate models should be based on some basis expansion of the feature, like cubic splines. Clearly if they are only testing for univariate linear relationships, there there are some issues there as well.