The two formulations are equivalent in the sense that for every value of $t$ in the first formulation, there exists a value of $\lambda$ for the second formulation such that the two formulations have the same minimizer $\beta$.
Here's the justification:
Consider the lasso formulation:
$$f(\beta)=\frac{1}{2}||Y - X\beta||_2^2 + \lambda ||\beta||_1$$
Let the minimizer be $\beta^*$ and let $b=||\beta^*||_1$. My claim is that if you set $t=b$ in the first formulation, then the solution of the first formulation will also be $\beta^*$. Here's the proof:
Consider the first formulation
$$\min \frac{1}{2}||Y - X\beta||_2^2 \text{ s.t.} ||\beta||_1\leq b$$
If possible let this second formulation have a solution $\hat{\beta}$ such that $||\hat{\beta}||_1<||\beta^*||_1=b$ (note the strictly less than sign). Then it is easy to see that $f(\hat{\beta})<f(\beta^*)$ contradicting the fact that $\beta^*$ is a solution for the lasso. Thus, the solution to the first formulation is also $\beta^*$.
Since $t=b$, the complementary slackness condition is satisfied at the solution point $\beta^*$.
So, given a lasso formulation with $\lambda$, you construct a constrained formulation using a $t$ equal to the value of the $l_1$ norm of the lasso solution. Conversely, given a constrained formulation with $t$, you find a $\lambda$ such that the solution to the lasso will be equal to the solution of the constrained formulation.
(If you know about subgradients, you can find this $\lambda$ by solving the equation $X^T(y-X\beta^*)=\lambda z^*$, where $z^* \in \partial ||\beta^*||_1)$
This has been written about on the site in detail. The lasso is meant to be a complete solution and it is completely inappropriate to use it to select features that are fed into a naive method that does not penalize for the context of having tortured data to find the features. You are also making an implicit assumption that the lasso finds the "right" predictors. Should you bootstrap the entire process you may be sorely disappointed to learn that in your case the features selected have a great deal of randomness in them. This is more true in the case of co-linearities.
If you want to emphasize parsimony over predictive accuracy, then model approximation (also called pre-conditioning) may be useful. Here one uses the best available prediction method, e.g. penalized maximum likelihood estimation with a quadratic penalty, i.e., ridge logistic regression. Then $X\hat{\beta}$ from that model is approximated using stepwise regression or recursive partitioning, etc. The approximate model inherits the proper amount of shrinkage from the full model. The approximate model may be chosen to yield $R^{2} = 0.95$ against the gold standard linear predictor.
Best Answer
PCA can be used as a dimensionality reduction technique if you drop Principal Components based on a heuristic, but it offers no feature selection, as the Principal Components are retained instead of the original features. However, tuning the number of Principal Components retained should work better than using heuristics, unless there are many low variance components and you are simply interested in filtering them.
LASSO ($\ell_1$ regularization) on the other hand can, intrinsically, perform feature selection as the coefficients of predictors are shrunk towards zero. It still requires hyperparameter tuning because there's a regularization coefficient that weights how severe is the regularization of the loss function.
As @MatthewDrury commented, ordinary PCA is agnostic to the target variable while LASSO regression isn't, as it's part of a regression model. This is the most important difference, actuallly.