The only method I am aware of is the Sherman–Morrison formula for performing a rank-1 update of an inverse, or its generalization, the Woodbury matrix identity for performing rank-k updates. In both cases, their advantage lies in the fact that $L$ does not have full rank. If $L$ has full rank, these methods won't provide you with enough speed up to make them worth it over just performing an LU decomposition on $A^{\mathrm{T}}A + L$ directly.
I ran into a similar problem where I potentially had several hundred to several thousand matrices to invert where the portion of the matrix sum that was changing had full rank, and LU decomposition was the best I was able to come up with.
Edit: Hunting through my stack of papers, I came across a good review of these techniques.
Edit (2nd question): Your second approach doesn't work, in general. The error is in the second equation, it should be
$$
\begin{align}
A^\mathrm{T}A + L &= Q \Lambda Q^{-1} + L \\
&= Q \Lambda Q^{-1} + Q Q^{-1} L Q Q^{-1} \\
&= Q( \Lambda + Q^{-1} L Q )Q^{-1}
\end{align}
$$
In general, $Q^{-1} L Q$ is not diagonal, unless $L$ commutes with $Q$ which usually only occurs when it is proportional to $I$.
Edit (idyl speculation): Upon reviewing the second question, I though of a way that may allow you to use that method. As stated previously, the only way for the second method to work is for $L$ to commute with $Q$. Since $L$ is diagonal, we can rewrite $L$ as $\alpha I + N$, where $\alpha$ is the most common diagonal element of $L$. Then, the rank of $N$ will be less than the rank of $A^\mathrm{T}A$. Thus, we can calculate $(A^\mathrm{T}A + L)^{-1}$ in two steps: calculate
$$B^{-1} = Q(\Lambda + \alpha I)^{-1}Q^{-1}$$
and then perform a $\text{rank}(N)$ update of $B^{-1}$ using the Woodbury matrix identity. If all the diagonal elements of $L$ are equally common, $\text{rank}(N) = \text{rank}(A^\mathrm{T}A) - 1$. So, this method is likely only worth it if you have many different $L$ matrices, and the cost of diagonalizing $A^\mathrm{T}A$ can be amortized. Obviously, the larger $A^\mathrm{T}A$ is, the smaller the difference between $\text{rank}(N)$ and $\text{rank}(A^\mathrm{T}A)$ for you to see a net benefit, but it will require some thought as to whether it is worth it, overall.
Best Answer
Note: This solution is not working for the updated question. $$ D = \text{diag}(a_{11}, \ldots, a_{nn}) = \sum_{i=1}^n P_{(i)} A P_{(i)} $$ where $P_{(i)}$ is the projection on the $i$-th coordinate: $$ (P_{(i)})_{jk} = \delta_{ij} \delta_{jk} \quad (i,j,k \in \{1,\ldots,n\}) $$ and $\delta$ is the Kronecker delta ($1$ for same index values, otherwise $0$).
Transforming the diagonal matrix $D$ into a row vector can be done by $$ d = u^T D $$ where each of the $n$ components of $u$ is $1$. $$ u = (1,1,\ldots,1)^T $$ Combining both gives $$ d = \sum_i u^T P_{(i)} A P_{(i)} = \sum_i e_i^T A P_{(i)} $$ where $e_i$ is the $i$-th canonical base vector.
Example: