To reduce clutter, define the symmetric matrix $$X = \sum_i x_i x_i^T$$
It is also handy to know that
$$\eqalign{
\log\det L &= {\rm tr}\log L \cr
}$$
Now use the Frobenius Inner (:) Product to write the function and its differential as
$$\eqalign{
f &= N\,{\rm tr}\log L - X:LL^T \cr\cr
df &= NL^{-T}:dL - X:2\,{\rm sym}(dL\,L^T) \cr
&= NL^{-T}:dL - 2\,{\rm sym}(X):dL\,L^T \cr
&= (NL^{-T} - 2\,XL):dL \cr
}$$
Since $df = (\frac{\partial f}{\partial L}:dL),\,$ the gradient must be
$$\eqalign{
\frac{\partial f}{\partial L} &= NL^{-T} - 2\,XL \cr
}$$
The version that I found has a different formula for $f$. Equation (2.36) in Section 2.6 (Smoothing, Weight Functions and Equivalent Kernels) states the following:
\begin{equation}
f = \sum_{i=1}^{n}\frac{\gamma_i\lambda_i}{\lambda_i + \sigma_n^2}u_i.
\end{equation}
This makes sense, as $K(K + \sigma_n^2I)^{-1}$ is a matrix and $y$ is a vector, so $f$ should be a vector. I will show how to get this formula.
Since $K$ is a real covariance matrix, it is a positive-definite real matrix, which guarantees that it is orthogonally diagonalizable. This means that the $u_i$ in the expression
\begin{equation}
K = \sum_{i=1}^{n}\lambda_i u_i u_i^{\mathsf{T}}
\end{equation}
form an orthonormal basis of $\mathbb{R}^n$.
\begin{eqnarray}
f &=& K(K + \sigma_n^2I)^{-1}y\\
&=& \left(\sum_{i=1}^{n}\lambda_i u_i u_i^{\mathsf{T}}\right)\left(\sum_{j=1}^{n}\lambda_j u_j u_j^{\mathsf{T}} + \sigma_n^2 I\right)^{-1}\left(\sum_{\ell=1}^{n}\gamma_{\ell} u_{\ell}\right)\\
&=& \left(\sum_{i=1}^{n}\lambda_i u_i u_i^{\mathsf{T}}\right)\sigma_n^{-2}\left( \sigma_n^{-2}\sum_{j=1}^{n}\lambda_j u_j u_j^{\mathsf{T}} +I\right)^{-1}\left(\sum_{\ell=1}^{n}\gamma_{\ell} u_{\ell}\right)
\end{eqnarray}
Recall the Taylor-MacLaurin series for $1 - x$:
\begin{equation}
1 - x = \sum_{n=0}^{\infty}x^n~~\textrm{for}~~|x| < 1.
\end{equation}
Then
\begin{equation}
1+ x = \sum_{n=0}^{\infty}(-x)^n~~\textrm{for}~~|x| < 1.
\end{equation}
An analogous formula can be used for the square-matrix analogue:
\begin{equation}
I + A = \sum_{m=0}^{\infty}(-A)^m~~\textrm{for}~~\left\Arrowvert A\right\Arrowvert < 1
\end{equation}
or if $A^N = O$, the zero matrix, for some $N$. This latter condition cannot hold for a positive-definite matrix.
Let's assume that this will work for $(\sigma_n^{-2}K + I)^{-1}$:
\begin{equation}
\left(I + \sigma_n^{-2}K\right)^{-1} = \sum_{m=0}^{\infty}(-\sigma_n^{-2})^mK^m.
\end{equation}
A valuable thing to notice about orthogonally diagonalizable matrices such as $K$ is that
\begin{equation}
K^m = \left(\sum_{i=1}^{n}\lambda_i u_iu_i^{\mathsf{T}}\right)^m = \sum_{i=1}^{n}\lambda_i^m u_iu_i^{\mathsf{T}}.
\end{equation}
I'll let you verify that. Use the fact that the $u_i$ form an orthonormal basis.
Now we can write $f$ as
\begin{eqnarray}
f &=& \sigma_n^{-2} \color{red}{K}\left[\sum_{m=0}^{\infty}(-\sigma_n^{-2})^m\color{red}{K^m}\right]y\\
&=& \sigma_n^{-2}\left[\sum_{m=0}^{\infty}(-\sigma_n^{-2})^m\color{red}{K^{m+1}}\right]y\\
&=& \sigma_n^{-2}\left[\sum_{m=0}^{\infty}(-\sigma_n^{-2})^m\color{red}{\sum_{i=1}^{n}\lambda_i^{m+1}u_i u_i^{\mathsf{T}}}\right]\sum_{\ell=1}^{n}\gamma_{\ell} u_{\ell}\\
&=& \sigma_n^{-2}\sum_{m=0}^{\infty}(-\sigma_n^{-2})^m\left[\color{red}{\sum_{i=1}^{n}\lambda_i^{m+1}u_i}\sum_{\ell=1}^{n}\gamma_{\ell}\color{red}{u_i^{\mathsf{T}}}u_{\ell}\right]
\end{eqnarray}
Since the $u_i$ are orthonormal,
\begin{equation}
u_i^{\mathsf{T}}u_{\ell} = \delta_{i,\ell},
\end{equation}
the Kronecker delta function. When we sum over $\ell$, the only term that "survives" will be the term for which $\ell = i$.
\begin{eqnarray}
f &=& \sigma_n^{-2}\sum_{m=0}^{\infty}(-\sigma_n^{-2})^m\left[\sum_{i=1}^{n}\lambda_i^{m+1}u_i\sum_{\ell=1}^{n}\gamma_{\ell}\delta_{i,\ell}\right]\\
&=& \sigma_n^{-2}\sum_{m=0}^{\infty}(-\sigma_n^{-2})^m\left[\sum_{i=1}^{n}\gamma_i\lambda_i^{m+1}u_i\right]\\
&=& \sigma_n^{-2}\sum_{i=1}^{n}\left[\sum_{m=0}^{\infty}(-\sigma_n^{-2})^m\lambda_i^{m}\right]\lambda_i\gamma_i u_i\\
&=& \sigma_n^{-2}\sum_{i=1}^{n}\left[\sum_{m=0}^{\infty}(-\sigma_n^{-2}\lambda_i)^m\right]\lambda_i\gamma_i u_i\\
&=& \sigma_n^{-2}\sum_{i=1}^{n}\left[(1 + \sigma_n^{-2}\lambda_i)^{-1}\right]\lambda_i\gamma_i u_i\\
&=& \sigma_n^{-2}\sum_{i=1}^{n}\frac{\gamma_i\lambda_i}{1 + \sigma_n^{-2}\lambda_i}u_i\\
&=& \sum_{i=1}^{n}\frac{\gamma_i\lambda_i}{\sigma_n^{2} + \lambda_i}u_i
\end{eqnarray}
Upon questioning from @yiming-che, I realized an easier approach. This approach is easy to see because $K$ is orthogonally diagonalizable. First, note that since the $u_i$ form an orthogonormal basis for $\mathbb{R}^n$, $I = \sum_{i=1}^{n}u_iu_i^{\mathsf{T}}$. This gives us a simpler expression for $I + \sigma_n^{-2}K$.
\begin{eqnarray}
\left(I + \sigma_n^{-2}K\right)^{-1} &=& \left(\sum_{i=1}^{n}u_iu_i^{\mathsf{T}} + \sum_{i=1}^{n}\sigma_n^{-2}\lambda_i u_iu_i^{\mathsf{T}}\right)^{-1}\\
&=& \left(\sum_{i=1}^{n}(1 + \sigma_n^{-2}\lambda_i)u_iu_i^{\mathsf{T}}\right)^{-1}
\end{eqnarray}
I claim that the last line can be re-written as
\begin{equation}
\left(\sum_{i=1}^{n}\left(1 + \sigma_n^{-2}\lambda_i\right)u_iu_i^{\mathsf{T}}\right)^{-1} = \sum_{i=1}^{n}\left(1 + \sigma_n^{-2}\lambda_i\right)^{-1}u_iu_i^{\mathsf{T}}.
\end{equation}
Let's prove it by multiplying the original matrix and the suspected inverse.
\begin{eqnarray}
&&\left[\sum_{j=1}^{n}\left(1 + \sigma_n^{-2}\lambda_j\right)u_ju_j^{\mathsf{T}}\right]\left[\sum_{i=1}^{n}\left(1 + \sigma_n^{-2}\lambda_i\right)^{-1}u_iu_i^{\mathsf{T}}\right]\\
&=& \sum_{i=1}^{n}\sum_{j=1}^{n}\left(1 + \sigma_n^{-2}\lambda_j\right)\left(1 + \sigma_n^{-2}\lambda_i\right)^{-1}u_j~\underbrace{u_j^{\mathsf{T}}u_i}_{\delta_{i,j}}~u_i^{\mathsf{T}}\\
&=& \sum_{i=1}^{n}\sum_{j=1}^{n}\left(1 + \sigma_n^{-2}\lambda_j\right)\left(1 + \sigma_n^{-2}\lambda_i\right)^{-1}u_j\delta_{i,j}u_i^{\mathsf{T}}\\
&=& \sum_{i=1}^{n}\left(1 + \sigma_n^{-2}\lambda_i\right)\left(1 + \sigma_n^{-2}\lambda_i\right)^{-1}u_iu_i^{\mathsf{T}}\\
&=& \sum_{i=1}^{n}u_iu_i^{\mathsf{T}}\\
&=&
\end{eqnarray}
This works as long as $1 + \sigma_n^{-2}\lambda_i \neq 0$ for each $i$.
Best Answer
Depending on the size of your matrix $\Sigma$ you probably shouldn't have been able to compute the determinant in any reasonable amount of the time. The time complexity is $\mathcal{O}(n!)$ where as when using the LU it is $\mathcal{O}(n^{3})$.
The reason it works is because of the following property of determinants.
$$ \Sigma = LL^{T} \implies \det(\Sigma) = \det(LL^{T}) = \det(L)\det(L^{T}) $$
For a triangular matrix, the determinant is the product of the diagonal.
$$ \det(L) = \prod_{i=1}^{n} L_{ii} $$
which means if you take the $\log$ then you get a sum.
$$ \log(\prod_{i=1}^{n} L_{ii}) = \sum_{i=1}^{n} \log(L_{ii}) $$
this follows from a basic property of logs. The $2$ there is because of the property of determinants. The determinant of the transpose of the matrix is the same as the determinant of the matrix.
$$ \det(A^{T}) = \det(A) $$
That means that when we take the log the power of $2$ comes down
$$ \det(\Sigma) = \det(LL^{T}) \\ \implies \det(L)\det(L^{T}) = \det(L)^{2} \\ \implies \log((\prod_{i=1}^{n} L_{ii})^{2}) = 2 \log(\prod_{i=1}^{n} L_{ii}) = 2 \sum_{i=1}^{n} \log(L_{ii}) $$