I don't believe there is anything wrong with using LASSO for variable selection and then using OLS. From "Elements of Statistical Learning" (pg. 91)
...the lasso shrinkage causes the estimates of the non-zero coefficients to be biased towards zero and in general they are not consistent [Added Note: This means that, as the sample size grows, the coefficient estimates do not converge]. One approach for reducing this bias is to run the lasso to identify the set of non-zero coefficients, and then fit an un-restricted linear model to the selected set of features. This is not always feasible, if the selected set is large. Alternatively, one can use the lasso to select the set of non-zero predictors, and then apply the lasso again, but using only the selected predictors from the first step. This is known as the relaxed lasso (Meinshausen, 2007). The idea is to use cross-validation to estimate the initial penalty parameter for the lasso, and then again for a second penalty parameter applied to the selected set of predictors. Since the variables in the second step have less "competition" from noise variables, cross-validation will tend to pick a smaller value for $\lambda$ [the penalty parameter], and hence their coefficients will be shrunken less than those in the initial estimate.
Another reasonable approach similar in spirit to the relaxed lasso, would be to use lasso once (or several times in tandem) to identify a group of candidate predictor variables. Then use best subsets regression to select the best predictor variables to consider (also see "Elements of Statistical Learning" for this). For this to work, you would need to refine the group of candidate predictors down to around 35, which won't always be feasible. You can use cross-validation or AIC as a criterion to prevent over-fitting.
In subset selection, the nonzero parameters will only be unbiased if you have chosen a superset of the correct model, i.e., if you have removed only predictors whose true coefficient values are zero. If your selection procedure led you to exclude a predictor with a true nonzero coefficient, all coefficient estimates will be biased. This defeats your argument if you will agree that selection is typically not perfect.
Thus, to make "sure" of an unbiased model estimate, you should err on the side of including more, or even all potentially relevant predictors. That is, you should not select at all.
Why is this a bad idea? Because of the bias-variance tradeoff. Yes, your large model will be unbiased, but it will have a large variance, and the variance will dominate the prediction (or other) error.
Therefore, it is better to accept that parameter estimates will be biased but have lower variance (regularization), rather than hope that our subset selection has only removed true zero parameters so we have an unbiased model with larger variance.
Since you write that you assess both approaches using cross-validation, this mitigates some of the concerns above. One remaining issue for Best Subset remains: it constrains some parameters to be exactly zero and lets the others float freely. So there is a discontinuity in the estimate, which isn't there if we tweak the lasso $\lambda$ beyond a point $\lambda_0$ where a predictor $p$ is included or excluded. Suppose that cross-validation outputs an "optimal" $\lambda$ that is close to $\lambda_0$, so we are essentially unsure whether p should be included or not. In this case, I would argue that it makes more sense to constrain the parameter estimate $\hat{\beta}_p$ via the lasso to a small (absolute) value, rather than either completely exclude it, $\hat{\beta}_p=0$, or let it float freely, $\hat{\beta}_p=\hat{\beta}_p^{\text{OLS}}$, as Best Subset does.
This may be helpful: Why does shrinkage work?
Best Answer
LASSO differs from best-subset selection in terms of penalization and path dependence.
In best-subset selection, presumably CV was used to identify that 2 predictors gave the best performance. During CV, full-magnitude regression coefficients without penalization would have been used for evaluating how many variables to include. Once the decision was made to use 2 predictors, then all combinations of 2 predictors would be compared on the full data set, in parallel, to find the 2 for the final model. Those 2 final predictors would be given their full-magnitude regression coefficients, without penalization, as if they had been the only choices all along.
You can think of LASSO as starting with a large penalty on the sum of the magnitudes of the regression coefficients, with the penalty gradually relaxed. The result is that variables enter one at a time, with a decision made at each point during the relaxation whether it's more valuable to increase the coefficients of the variables already in the model, or to add another variable. But when you get, say, to a 2-variable model, the regression coefficients allowed by LASSO will be lower in magnitude than those same variables would have in the standard non-penalized regressions used to compare 2-variable and 3-variable models in best-subset selection.
This can be thought of as making it easier for new variables to enter in LASSO than in best-subset selection. Heuristically, LASSO trades off potentially lower-than-actual regression coefficients against the uncertainty in how many variables should be included. This would tend to include more variables in a LASSO model, and potentially worse performance for LASSO if you knew for sure that only 2 variables needed to be included. But if you already knew how many predictor variables should be included in the correct model, you probably wouldn't be using LASSO.
Nothing so far has depended on collinearity, which leads different types of arbitrariness in variable selection in best-subset versus LASSO. In this example, best-subset examined all possible combinations of 2 predictors and chose the best among those combinations. So the best 2 for that particular data sample win.
LASSO, with its path dependence in adding one variable at a time, means that an early choice of one variable may influence when other variables correlated to it enter later in the relaxation process. It's also possible for a variable to enter early and then for its LASSO coefficient to drop as other correlated variables enter.
In practice, the choice among correlated predictors in final models with either method is highly sample dependent, as can be checked by repeating these model-building processes on bootstrap samples of the same data. If there aren't too many predictors, and your primary interest is in prediction on new data sets, ridge regression, which tends to keep all predictors, may be a better choice.