I think you should use a range of $0$ to
$$\lambda_\text{max}^\prime = \frac{1}{1-\alpha}\lambda_\text{max}$$
My reasoning comes from extending the lasso case, and a full derivation is below. The qualifier is it doesn't capture the $\text{dof}$ constraint contributed by the $\ell_2$ regularization. If I work out how to fix that (and decide whether it actually needs fixing), I'll come back and edit it in.
Define the objective
$$f(b) = \frac{1}{2} \|y - Xb\|^2 + \frac{1}{2} \gamma \|b\|^2 + \delta \|b\|_1$$
This is the objective you described, but with some parameters substituted to improve clarity.
Conventionally, $b=0$ can only be a solution to the optimization problem $\min f(b)$ if the gradient at $b = 0$ is zero. The term $\|b\|_1$ is non-smooth though, so the condition is actually that $0$ lies in the subgradient at $b = 0$.
The subgradient of $f$ is
$$\partial f = -X^T(y - Xb) + \gamma b + \delta \partial \|b\|_1$$
where $\partial$ denotes the subgradient with respect to $b$. At $b=0$, this becomes
$$\partial f|_{b=0} = -X^Ty + \delta[-1, 1]^d$$
where $d$ is the dimension of $b$, and a $[-1,1]^d$ is a $d$-dimensional cube. So for the optimization problem to have a solution of $b = 0$, it must be that
$$(X^Ty)_i \in \delta [-1, 1]$$
for each component $i$. This is equivalent to
$$\delta > \max_i \left|\sum_j y_j X_{ij} \right|$$
which is the defintion you gave for $\lambda_\text{max}$. If $\delta = (1-\alpha)\lambda$ is now swapped in, the formula from the top of the post falls out.
Can I say something about the propensity to overfit in (A) versus (B)?
Provided that both grids cover a sufficient range, grid fineness doesn't really have anything to do with overfitting in this problem (though a coarse grid might underfit if it skips over a profitable interval). It's not as if testing too many values will somehow change what out-of-sample looks like.* In the case of these penalized regressions, we definitely want to optimize our penalized likelihood function for values $\lambda$, and it doesn't matter how many values of $\lambda$ we test, because out-of-sample performance for a fixed data set and fixed partitioning is entirely deterministic. More to the point, the out-of-sample metric is not at all altered by how many values $\lambda$ you test. A coarser grid might mean that you skip over the absolute minimum in your out-of-sample metric, but finding the absolute minimum probably isn't desirable in the first place because hyperparameters tend to be poorly estimated, and finite sample properties mean that data limitations will be a source noise in that estimate that will overwhelm slight changes in the distance between adjacent grid points: the standard error of your estimate will tend to swamp differences in grid fineness.
If you're truely concerned that out-of-sample performance metric might be overly optimistic, you could adopt the 1 standard error rule, which picks the most regularized model within 1 standard error of the minimum. That way, you're being slightly more conservative and picking a less complex model.
Can I determine the optimal grid fineness? How?
The LARS algorithm does not a priori define which values of $\lambda$ to check; rather, $\lambda$ is changed continuously and the algorithm checks for values of $\lambda$ for which a coefficient goes from 0 to a nonzero value. Those values of $\lambda$ where a new coefficient is nonzero are retained, with the observation that coefficient paths are piecewise linear in the case of the lasso, so there's no loss of information by just storing off the knots in that case. LARS only works when coefficient paths are piecewise linear, though. The ridge penalty never shrinks a coefficient to precisely zero, so all of your coefficient paths are smooth and always nonzero; likewise elastic net regressions (excluding the case of elastic net regressions which are also lasso regressions).
But most people use GLMNET because it's often faster. In terms of determining what grid of $\lambda$ to search over, I recommend reading the GLMNET article "Regularization Paths for Generalized Linear Models via Coordinate Descent" by Jerome Friedman, Trevor Hastie, and Rob Tibshirani. In it, they develop a very efficient algorithm for estimating ridge, lasso and elastic net regressions. The algorithm checks for a value of $\lambda_\text{max}$ for which $\beta$ is the zero vector, and then identifies a minimum value $\lambda_\text{min}$ relative to $\lambda_\text{max}$. Finally, they generate a sequence of values between the two uniformly on the log scale. This grid is sufficient for most purposes, though it does omit the property that you will know precisely when a coefficient is estimated at a nonzero value. Warm starts are used to provide solutions much more quickly, and it supports many common GLMs.
*You might be thinking about this from the perspective of an artificial neural network, where early stopping is sometimes used to accomplish regularization, but that's an entirely unrelated problem (namely, that the optimization algorithm is prevented from reaching an optimum, so the model is forced to be less complex).
Best Answer
It is true that the out-of-sample performance on a fixed data set is deterministic, the question is whether the performance is generalizable. And that is not the case if you optimize your model to perform best on that fixed test data set.
Whenenver you optimize hyperparameters (here lambda) by checking the out-of-sample performance on a specific data set, you can no longer use that data set to get unbiased performance estimates. This does not mean that you don't want to optimize the parameter, it just means you can't use the performance estimates.
But don't believe me blindly, better to run a simulation yourself.
In the case of the grid size, you won't make overfitting much worse by making it finer. The important thing is whether you tune $\lambda$.