In the $p < n$ case ($p$ number of coefficients, $n$ number of samples, which by the number of coefficients you show in the plots I guess it is the case here), the only real "problem" with the Lasso model is that when multiple features are correlated it tends to select one of then somewhat randomly.
If the original features are not very correlated, I would say that it is reasonable that Lasso performs similar to Elastic Net in terms of coefficients path. Looking at the documentation for glmnet package, I also can't see any error in your code.
I think you should use a range of $0$ to
$$\lambda_\text{max}^\prime = \frac{1}{1-\alpha}\lambda_\text{max}$$
My reasoning comes from extending the lasso case, and a full derivation is below. The qualifier is it doesn't capture the $\text{dof}$ constraint contributed by the $\ell_2$ regularization. If I work out how to fix that (and decide whether it actually needs fixing), I'll come back and edit it in.
Define the objective
$$f(b) = \frac{1}{2} \|y - Xb\|^2 + \frac{1}{2} \gamma \|b\|^2 + \delta \|b\|_1$$
This is the objective you described, but with some parameters substituted to improve clarity.
Conventionally, $b=0$ can only be a solution to the optimization problem $\min f(b)$ if the gradient at $b = 0$ is zero. The term $\|b\|_1$ is non-smooth though, so the condition is actually that $0$ lies in the subgradient at $b = 0$.
The subgradient of $f$ is
$$\partial f = -X^T(y - Xb) + \gamma b + \delta \partial \|b\|_1$$
where $\partial$ denotes the subgradient with respect to $b$. At $b=0$, this becomes
$$\partial f|_{b=0} = -X^Ty + \delta[-1, 1]^d$$
where $d$ is the dimension of $b$, and a $[-1,1]^d$ is a $d$-dimensional cube. So for the optimization problem to have a solution of $b = 0$, it must be that
$$(X^Ty)_i \in \delta [-1, 1]$$
for each component $i$. This is equivalent to
$$\delta > \max_i \left|\sum_j y_j X_{ij} \right|$$
which is the defintion you gave for $\lambda_\text{max}$. If $\delta = (1-\alpha)\lambda$ is now swapped in, the formula from the top of the post falls out.
Best Answer
You're confused; $\alpha$ and $\lambda$ are totally different.
$\alpha$ sets the degree of mixing between ridge regression and lasso: when $\alpha = 0$, the elastic net does the former, and when $\alpha = 1$, it does the latter. Values of $\alpha$ between those extremes will give a result that is a blend of the two.
Meanwhile, $\lambda$ is the shrinkage parameter: when $\lambda = 0$, no shrinkage is performed, and as $\lambda$ increases, the coefficients are shrunk ever more strongly. This happens regardless of the value of $\alpha$.