I'm not an expert in this area, but this may be a start.
Rather than $\prod_{p\lt n}$, you can use $\prod_{p\le \sqrt n}$.
$\log\log \sqrt n + \gamma \lt \log\log \sqrt n +\log 2 = \log\log n $
That gets you a little closer, since now you are off by $\log 2 - \gamma \approx 0.116$.
The heuristic probability that $n$ is prime is not
$$\prod_{p\lt n} (1-Pr(p|n))$$
It is the product of probabilities
$$\prod_{p\lt n} (1-Pr(p|n \text{ given no smaller prime divides } n))$$
For $p$ small, the term you get may be close to $(1-1/p)$, but I that's not the case for $p$ large.
For $\sqrt n \lt p$, the term corresponding to $p$ in the product is just $1$.
For $\sqrt[3]n \lt p \le \sqrt n$, if $p$ is the smallest prime dividing $n$, then $n/p$ must be prime, too. Perhaps that means that by strong induction, we should discount these terms by the probability $n/p$ is prime, about $1/\log \frac np$, so that those terms in the product are $(1-1/(p \log \frac np))$.
It looks like you get some sums/integrals if you try to extend this to more terms, and I don't know whether you can expect to get the desired accuracy at the end.
The complex-analytic proof of the prime number theorem can help inform the elementary one.
The von Mangoldt function $\Lambda$ is related to the Riemann zeta function $\zeta$ by the formula
$$ \sum_n \frac{\Lambda(n)}{n^s} = -\frac{\zeta'(s)}{\zeta(s)},$$
at least in the region $\mathrm{Re}(s) > 1$. The right-hand side has a pole of residue 1 at the simple pole $s=1$ of $\zeta$, and a pole of residue -1 at each zero $s=\rho$ of $\zeta$. Applying Perron's formula carefully, one eventually arrives at the von Mangoldt explicit formula
$$ \sum_{n \leq x} \Lambda(n) = x - \sum_\rho \frac{x^\rho}{\rho} + \dots$$
where the remaining terms $\dots$ (as well as the way in which the infinite sum over zeroes is interpreted) will be ignored for this discussion. This formula strongly suggests that zeroes $\rho$ with real part strictly less than one will eventually only make a lower order contribution to $\sum_{n \leq x} \Lambda(n)$, whereas zeroes $\rho$ with real part equal to one would likely destroy the prime number theorem. In fact one can pursue this line of reasoning more carefully and show that the prime number theorem is equivalent to the absence of zeroes of zeta on the line $\mathrm{Re}(s)=1$.
But what if there was a way to make the contribution of the zeroes of zeta smaller than those of the pole, even when said zeroes lie on the line $\mathrm{Re}(s)=1$? One way to do this is to replace the log-derivative $-\frac{\zeta'(s)}{\zeta(s)}$ by the variant $\frac{\zeta''(s)}{\zeta(s)}$. Assuming for simplicity that all zeroes are simple, this function still has a simple pole at every zero $\rho$ (with residue $\frac{\zeta''(\rho)}{\zeta'(\rho)}$), but now has a double pole at $s=1$ (with residue $2$). As it turns out, this variant also has a Dirichlet series representation
$$ \sum_n \frac{ \Lambda_2(n) }{n^s} = \frac{\zeta''(s)}{\zeta(s)}$$
which by Perron's formula suggests an asymptotic of the form
$$ \sum_{n \leq x} \Lambda_2(n) = 2 x \log x + \sum_\rho \frac{\zeta''(\rho)}{\zeta'(\rho)} \frac{x^\rho}{\rho} + \dots$$
Now, even if there are zeroes on $\mathrm{Re}(s)=1$, their contribution should still be smaller than the main term, and one may now be led to predict an asymptotic of the form
$$ \sum_{n \leq x} \Lambda_2(n) = 2 x \log x + O(x) \quad (4)$$
and furthermore this asymptotic should be easier to prove than the prime number theorem, as it is not reliant on preventing zeroes at the edge of the critical strip.
The functions $\Lambda_2$ and $\Lambda$ are related via the identity
$$ \frac{\zeta''(s)}{\zeta(s)} = -\frac{d}{ds} (-\frac{\zeta'(s)}{\zeta(s)}) +
(-\frac{\zeta'(s)}{\zeta(s)})^2$$
which after inverting the Dirichlet series leads to
$$ \Lambda_2(n) = \Lambda(n) \log n + \sum_{d|n} \Lambda(d) \Lambda(\frac{n}{d}) \quad (5)$$
and from this and (4) we soon obtain (2) and hence (1).
One can also think about this from a "music of the primes" standpoint. The explicit formula is morally resolving $\Lambda$ into "harmonics" as
$$ \Lambda(n) \approx 1 - \sum_\rho n^{\rho-1} + \dots \quad (6)$$
where I am deliberately being vague as to how to interpret the symbols $\approx$, $\sum$, and $\dots$. The Dirichlet convolution of $1$ with itself resembles $\log n$, while the Dirichlet convolution of $-n^{\rho-1}$ with itself resembles $n^{\rho-1} \log n$; also, cross-terms such as the convolution of $1$ with $-n^{\rho-1}$ experience significant cancellation and should be lower order. Thus the right-hand side of (5) amplifies the "1" harmonic of $\Lambda$ by a factor of about $2 \log n$, while damping out the contributions of each of the zeroes $\rho$ due to cancellation between the two terms, leading one to (1) or (2) as a smoothed out version of the prime number theorem. Conversely, (2) combined with (5) can be viewed as an elementary way to encode (the largest terms of) the explicit formula (4), without explicit mention of zeroes.
The most difficult part of the Erdos-Selberg elementary proof of the PNT is the passage from (1) to the prime number theorem, as here we must actually eliminate the scenario in which there is a zero on $\mathrm{Re}(s)=1$. The key to doing so is to exploit the non-negativity of $\Lambda$; morally, the expansion (6) prevents $\rho$ (and hence also $\overline{\rho}$) from lying on $\mathrm{Re}(s)=1$ as this would make the right-hand side RHS negative "on average" for certain sets of $n$. There are several ways to make this intuition precise; the standard complex-analytic proofs often proceed by testing $\Lambda$ against $n^{\pm it}$ and $n^{\pm 2it}$, where $1+it$ is a hypothetical zero of $\zeta$. Very roughly speaking, the idea of the elementary proof is to exploit not only the non-negativity of $\Lambda$, but also upper bounds coming from sieve-theoretic bounds such as the Brun-Titchmarsh inequality, which roughly speaking tell us that $\Lambda$ behaves "on average" as if it lies between $0$ and $2$. (Actually, the Selberg formula (2) already gives an adequate substitute for this inequality for the purposes of this argument.) On the other hand, from (6) we see that if there was a zero $\rho = 1+it$ on the line $\mathrm{Re}(s)=1$, then $\Lambda-1$ should have an extremely large correlation with $\mathrm{Re} n^{\rho-1} = \cos(t \log n)$, and these two facts can be placed in contradiction with each other, basically because the average value of $|\cos(t \log n)|$ is strictly less than one. However, because the zeroes $\rho$ are not explicitly present in the Selberg formula (2), one has to work quite a bit in the elementary proof to make this argument rigorous. In this blog post of mine I use some Banach algebra theory to tease out the zeroes $\rho=1+it$ from the Selberg formula (they can be interpreted as the "Gelfand spectrum" of a certain Banach algebra norm associated to that formula) to make this connection more explicit.
Best Answer
There are several ideas here, some mentioned in the other answers:
One: When Gauss was a boy (by the dates found on his notes he was approximately 16) he noticed that the primes appear with density $ \frac{1}{\log x}$ around $x$. Then, instead of counting primes and looking at the function $\pi (x)$, lets weight by the natural density and look at $\sum_{p\leq x} \log p$. Since we are weighting by what we think is the density, we expect it to be asymptotic to be $x$.
Two: Differentiation of Dirichlet series. If $$ A(s)=\sum_{n=1}^{\infty} a_{n} n^{-s} $$ then $$ (A(s))'=-\sum_{n=1}^{\infty} a_{n} log(n) n^{-s}$$ The $\log$ term appears naturally in the differentiation of Dirichlet series. Taking the convolution of $\log n$ with the Mobius function (that is multiplying by $\frac{1}{\zeta(s)}$) then gives the $\Lambda(n)$ mentioned above. The $\mu$ function is really the special thing here, not the logarithm.
Expanding on this, there are other weightings besides $\log p$ which arise naturally from taking derivatives. Instead we can look at $\zeta^{''}(s)=\sum_{n=1}^\infty (\log n)^2 n^{-s}$, and then multiply by $\frac{1}{\zeta(s)}$ as before. This leads us to examine the sum $$\sum_{n\leq x} (\mu*\log^2 )(n)$$ (The $*$ is Dirichlet convolution) By looking at the above sum, Selberg was able to prove his famous identity which was at the center of the first elementary proof of the prime number theorem:
$$\sum_{p \leq x} log^2 p +\sum_{pq\leq x}(\log p)(\log q) =2x\log x +O(x).$$
Three: The primes are intimately connected to the zeros of $\zeta(s)$, and contour integrals of $\frac{1}{\zeta(s)}$. (Notice it was featured everywhere here so far) We can actually prove that
$$\sum_{p^k \leq x} \log p= x - \sum_{\rho :\ \zeta(\rho)=0} \frac{x^{\rho}}{\rho} +\frac{\zeta'(0)}{\zeta(0)} $$
Notice that the above is an equality, which is remarkable since the left hand side is a step function. (Somehow, at prime powers all of the zeros of zeta conspire and make the function jump.)