With which matrix form formula can I compute the lasso regression solutions?
As @Matthew Drury points out there is no closed form solution to the multivariate lasso problem. To understand why this is the case you must first familiarise yourself with the closed form solution to the univariate lasso problem which is derived here:
Univariate Lasso problem
Computing the subdifferential of the Lasso cost function and equating to zero to find the minimum:
\begin{aligned}
\partial_{\theta_j} RSS^{lasso}(\theta) &= \partial_{\theta_j} RSS^{OLS}(\theta) + \partial_{\theta_j} \lambda || \theta ||_1
\\
0 & = -\rho_j + \theta_j z_j + \partial_{\theta_j} \lambda || \theta_j ||
\\
0 & = \begin{cases}
-\rho_j + \theta_j z_j - \lambda & \text{if}\ \theta_j < 0 \\
[-\rho_j - \lambda ,-\rho_j + \lambda ] & \text{if}\ \theta_j = 0 \\
-\rho_j + \theta_j z_j + \lambda & \text{if}\ \theta_j > 0
\end{cases}
\end{aligned}
For the second case we must ensure that the closed interval contains the zero so that $\theta_j = 0$ is a global minimum
\begin{aligned}
0 \in [-\rho_j - \lambda ,-\rho_j + \lambda ]
\end{aligned}
Solving for $\theta_j$ gives:
\begin{aligned}
\begin{cases}
\theta_j = \frac{\rho_j + \lambda}{z_j} & \text{for} \ \rho_j < - \lambda \\
\theta_j = 0 & \text{for} \ - \lambda \leq \rho_j \leq \lambda \\
\theta_j = \frac{\rho_j - \lambda}{z_j} & \text{for} \ \rho_j > \lambda
\end{cases}
\end{aligned}
We recognize this as the soft thresholding function with a normalizing constant.
Multivariate Lasso problem
$$ min_{\theta} \ \frac{1}{2} \ || \mathbf{y - X \theta}||_2^2 + \lambda || \theta||_1$$
Taking the subdifferential and ensuring that the zero is contained in the closed interval (i.e. stationary conditions for global minimum) gives:
$$ 0 \in \mathbf{X^T(y - X \theta) + \partial (\lambda || \theta||_1)}$$
Re-arranging we get
$$ \mathbf{X^T(y - X\theta)} \in \lambda || \theta ||_1$$
Where $\theta$ appears on both sides of the equation. You can only solve this explicitely if $X^TX = I$ which means that $X$ is orthonormal, since you could then use the soft-thresolding function to write explicit solutions for each component in $\theta$
You'll be disappointed to find that the consistency that matters the most with lasso is the consistency about which predictors are chosen. If you simulate two moderately large datasets and perform lasso independently and compare the results, the low degree of overlap will reveal the difficulty of the task in selecting features. This is even more true when co-linearities are present. lasso spends too much of its energy on feature selection intead of estimation, and the L1 norm results in too much shrinkage of truly important predictors (hence the popularity of the horseshoe prior in Bayesian high-dimensional modeling). I wouldn't be too interested in the type of consistency you described above until these more fundamental issues are addressed. I discuss these issues in general, and show how the bootstrap can help uncover them, here in the chapter on challenges of high-dimensional data analysis.
Best Answer
You have three different questions. Let's tackle them in a somewhat different order. I'll refer to ISLR 2nd edition.
First off, here are the two different formulations of the lasso:
$$ \text{minimize } ||\mathbf{y}-\mathbf{X}\mathbf{\beta}||_2+\lambda||\mathbf{\beta}||_1\text{ with respect to }\mathbf{\beta}\text{ for a given }\lambda\geq 0 \quad (6.7)$$ and $$ \text{minimize } ||\mathbf{y}-\mathbf{X}\mathbf{\beta}||_2\text{ with respect to }\mathbf{\beta}\text{ subject to }||\mathbf{\beta}||_1\leq s\text{ for a given }s\geq0\quad (6.8) $$
The 2-norm $||\cdot||_2$ of a vector is the sum of squared entries, the 1-norm $||\cdot||_1$ is the sum of absolute entries.
On to your questions:
How do $\lambda$ and $s$ hang together?
There is a one-to-one relationship between the two, in the following sense: for every $\lambda$, there is one $s$ such that the minimizer $\mathbf{\beta}$ of (6.7) for $\lambda$ is equal to the minimizer of (6.8). And vice versa. (Note that the relationship is not unique, see below.)
One interesting boundary case is that $\lambda=0$ (no lasso penalty), which leads to the OLS estimate of $\mathbf{\beta}$ (assuming your design matrix $\mathbf{X}$ is of full rank), corresponds to $s=\infty$ (no constraint on the parameter estimates). If you are uncomfortable with $s=\infty$, just take $s$ to be any number larger than the 1-norm of the OLS estimate of $\mathbf{\beta}$.
Why are the two formulations equivalent?
This derivation is usually not given in standard statistics/data science textbooks. Most people just accept this fact. I don't think it is formally proven even in the original lasso paper by Tibshirani (1996), but it does not seem to be very hard. Starting with (6.7), pick a $\lambda$, find the optimal $\mathbf{\beta}$ in (6.7), use the 1-norm of this estimate for $s$ in (6.8) and argue that you won't find a better solution to (6.8) with smaller 1-norm. And vice versa. A rigorous proof would probably be a nice exercise for an undergraduate math student.
How do we choose $\lambda$ (or equivalently $s$)?
The algorithm usually used to fit a lasso model will give you an entire sequence of parameter coefficient vectors, one for each one of multiple values of $\lambda$ (e.g., Figure 6.6 in ISLR). You can then choose one. This is often done via cross-validation using some accuracy measure, and in fact, your software may already perform this cross-validation internally and just report the optimal $\lambda$.