Note that $$ \text{plim} \Big[(X'X + \lambda I_k)^{-1} X'X\Big]
=\text{plim}(n^{-1}X'X + n^{-1}\lambda I_k)^{-1}\text{plim}(n^{-1}X'X)$$
The second plim converges by asumption. For the first we have
$$\text{plim}(n^{-1}X'X + n^{-1}\lambda I_k)^{-1}=\left(\text{plim}n^{-1}X'X + \text{plim}n^{-1}\lambda I_k\right)^{-1} $$
and that
$$\text{plim}n^{-1}\lambda I_k = \text{lim}n^{-1}\lambda I_k = 0$$
leading to the desired consistency result. Intuitively the purpose of adding a term like $\lambda I_k$ is to handle a "bad sample", i.e. it is a finite-sample "tactic" to get results, but whose effect is eliminated asymptotically.
Your derivation is not really precise, you are not really taking the derivative, but the subderivative, the function $|x|$ is not differentiable when $x = 0$. The subderivative $s$ of the absolute value when $x =0$ is $s\in [-1, 1]$
Thus, the conditions you derived are for the case where $\hat{\beta}^{lasso}_j \neq 0$ where indeed the subdifferential of the absolute value is equal the sign. But now consider the case $\hat{\beta}^{lasso}_j = 0$. By the KKT conditions, this will happen when $-\hat{\beta}_j^{ols} + s\frac{\lambda}{2} = 0$ which implies $|\hat{\beta}_j^{ols}| \leq \frac{\lambda}{2}$, since $s\in [-1, 1]$ when $\hat{\beta}^{lasso}_j = 0$.
The LASSO problem
For the sake of completeness I will write down the the lasso problem here. Our goal is to minimize
$$\min_{\beta} || Y - X\beta||_2^2 + \lambda||\beta||_1$$
where $||\cdot||_1$ is the $l_1$ norm. This a convex optimization problem, and the optimum is characterized by the KKT conditions:
$$
-2X'(Y - X\beta) + \lambda s = 0
$$
where $s$ is the subgradient of the $l_1$ norm, that is, $s_j = sign(\beta_j)$ if $\beta_j \neq 0$ and $s_j \in [-1, 1]$ if $\beta_j = 0$.
In the orthonormal case, $X'Y = \hat{\beta}^{OLS}$ and $X'X = I$, simplifying this to:
$$
-2\hat{\beta}^{OLS} +2\beta + \lambda s = 0
$$
Thus, consider the case where the solution would be $\beta_j = 0$. For this to be true we must have that $-2\hat{\beta}_j^{OLS} + \lambda s_j = 0$ which implies $|\hat{\beta}_j^{OLS}| \leq \frac{\lambda}{2}$, since $s_i \in [-1, 1]$. Since this a convex program, KKT is sufficient, and the condition works both ways, that is, $|\hat{\beta}_j^{OLS}| \leq \frac{\lambda}{2} \implies \beta_j = 0$
Best Answer
You'll be disappointed to find that the consistency that matters the most with lasso is the consistency about which predictors are chosen. If you simulate two moderately large datasets and perform lasso independently and compare the results, the low degree of overlap will reveal the difficulty of the task in selecting features. This is even more true when co-linearities are present. lasso spends too much of its energy on feature selection intead of estimation, and the L1 norm results in too much shrinkage of truly important predictors (hence the popularity of the horseshoe prior in Bayesian high-dimensional modeling). I wouldn't be too interested in the type of consistency you described above until these more fundamental issues are addressed. I discuss these issues in general, and show how the bootstrap can help uncover them, here in the chapter on challenges of high-dimensional data analysis.