Assume we are given a set of $n$ $p$-dimensional observations, $x_i \in \mathbb{R}^p$, $i = 1, \dotsc, n$. Assume a model of the form:
\begin{align}
Y_i = \langle \beta, x_i\rangle + \epsilon
\end{align}
where $\epsilon \sim N(0, \sigma^2)$, $\beta \in \mathbb{R}^p$, and $\langle \cdot, \cdot \rangle$ denoting the inner product. Let $\hat{\beta} = \delta(\{Y_i\}_{i=1}^n)$ be an estimate of $\beta$ using fitting method $\delta$ (either OLS or LASSO for our purposes). The formula for degrees of freedom given in the article (equation 1.2) is:
\begin{align}
\text{df}(\hat{\beta}) = \sum_{i=1}^n \frac{\text{Cov}(\langle\hat{\beta}, x_i\rangle, Y_i)}{\sigma^2}.
\end{align}
By inspecting this formula we can surmise that, in accordance with your intuition, the true DOF for the LASSO will indeed be less than the true DOF of OLS; the coefficient-shrinkage effected by the LASSO should tend to decrease the covariances.
Now, to answer your question, the reason that the DOF for the LASSO is the same as the DOF for OLS in your example is just that there you are dealing with estimates (albeit unbiased ones), obtained from a particular dataset sampled from the model, of the true DOF values. For any particular dataset, such an estimate will not be equal to the true value (especially since the estimate is required to be an integer while the true value is a real number in general).
However, when such estimates are averaged over many datasets sampled from the model, by unbiasedness and the law of large numbers such an average will converge to the true DOF. In the case of the LASSO, some of those datasets will result in an estimator wherein the coefficient is actually 0 (though such datasets might be rare if $\lambda$ is small). In the case of OLS, the estimate of the DOF is always the number of coefficients, not the number of non-zero coefficients, and so the average for the OLS case will not contain these zeros. This shows how the estimators differ, and how the average estimator for the LASSO DOF can converge to something smaller than the average estimator for the OLS DOF.
Firstly, I think it's worth noting that the description of what ridge does assumes that the data matrix is orthonormal.
Secondly, the answer to your question is yes under those circumstances. The details may be found in "Elements of Statistical Learning" on p. 69 bis (section 3.4.3) . The short story is that
$ \beta \to \text{sign}(\beta)\max(\beta-\lambda,0)$ is the formula. Please see the book for the complete discussion, better formatting, and details.
Best Answer
Yes, but it depends on what you're goal is. It's a little complicated.
What is adaptive Lasso?
Adaptive Lasso was introduced in Zhou (2006). Adaptive Lasso is a modification of Lasso where each coefficient, $\beta_j$, is given its own weight, $w_j$. The coefficients are estimated by minimizing the objective function,
$$ \underset{\beta}{\arg \min }\left\|\mathbf{y}-\sum_{j=1}^{p} \mathbf{x}_{j} \beta_{j}\right\|^{2}+\lambda \sum_{j=1}^{p} w_{j}\left|\beta_{j}\right|. $$
The weights control the rate each coefficient is shrunk towards 0. The general idea is that smaller coefficients should leave the model before larger coefficients.
How do you choose the weights?
The adaptive Lasso is very general. You can set the weights however you'd like, and you'll get something out. You might want to consider what the "best" set of weights are. Zhou (2006) say that you should choose your weights so the adaptive Lasso estimates have the Oracle Property:
To ensure the Adaptive Lasso has these properties, you need to choose the weights as $w_j = 1/|\hat{\beta_j}|^{\gamma}$, where $\gamma > 0$ and $\hat{\beta_j}$ is an unbiased estimate of the true parameter, $\beta$. Generally, people choose the Ordinary Least Squares (OLS) estimate of $\beta$ because it will be unbiased. Ridge regression produces coefficient estimates that are biased, so you cannot guarantee the Oracle Property holds
What if I use something else for the weights?
What happens if you ignore the requirement of using unbiased estimates for the weights and use Ridge regression? You can't guarantee you'll get the right subset of coefficients and that they'll have the correct distribution. In practice, this probably doesn't matter. The Oracle Property is an asymptotic guarantee (when $n \to \infty$), so it doesn't necessary apply to your data with a finite number of observations. There may be scenarios where using Ridge estimates for weights performs really well. Zhou (2006) recommends using Ridge regression over OLS when your variables are highly correlated.