Here is a heuristic that I plan to make into a blog post some day. Suppose that there were only finitely many primes with first digit $9$. Is your estimate good enough to see that?
To be more precise, suppose that there were no primes between $9 \times 10^k$ and $10^{k+1}$ for all sufficiently large $k$. And suppose that the number of primes between $a$ and $b$, for $10^k \leq a < b \leq 9 \times 10^k$ is $\approx \frac{\log 10}{\log 9} \int_{a}^b dt/\log t$ (when $a$ is not too close to $b$). We'll see later where the fraction $\log 10/\log 9$ comes from.
The first thing to note is that this would violate the prime number theorem. In this scenario, we would have $\pi(9 \times 10^k) = \pi(10 \times 10^k)$ for $k$ large. But the prime number theorem says that
$$\pi(10\times 10^k) - \pi(9 \times 10^k) \sim \frac{10 \times 10^k}{k \log 10 + \log {10}} - \frac{9 \times 10^k}{k \log 10 + \log {9}} \sim \frac{10^k}{k \log 10} \to \infty.$$
So proving the prime number theorem involves disproving this ridiculous scenario.
Now, let's see that the scenario is consistent with $\sum_{p \leq N} 1/p = \log \log N + M + O(1/\log N)$. The sum over the primes between $10^k$ and $10^{k+1}$ would be roughly
$$\frac{\log 10}{\log 9} \int_{10^k}^{9 \times 10^k} \frac{dt}{t \log t} = \frac{\log 10}{\log 9} \left( \log \log (9 \times 10^k) - \log \log 10^k \right)$$
$$=\frac{\log 10}{\log 9} \left( \log( k \log 10+\log 9) - \log (k \log 10) \right) = \frac{\log 10}{\log 9} \left( \log \left( 1+\frac{\log 9}{k \log 10} \right) \right) $$
$$ = \frac{\log 10}{\log 9} \frac{\log 9}{k \log 10} + O(1/k^2)= \frac{1}{k} + O(1/k^2)$$
So
$$\sum_{p \leq 10^{n+1}} \frac{1}{p} = \sum_{j=1}^n \left( \frac{1}{j} + O(1/j^2) \right)=$$
$$\log n + B + O(1/n) = \log \log 10^{n+1} + C + O(1/\log 10^{n+1}).$$
Very important exercise left for you: If you redo this computation for $\sum_{p \leq 9 \times 10^k} 1/p$, you get $\log \log (9 \times 10^k) + C + O(1/\log(9 \times 10^k))$ for the same constant $C$. The point is that $\log \log 10^{k+1} - \log \log (9 \times 10^k) = O(1/\log 10^k)$, so this estitmate is consistent with $\sum_{p \leq N} 1/p$ not growing at all between $9 \times 10^k$ and $10 \times 10^k$.
This trick is useful for refuting other simple approaches to the PNT. For example, the "primes hate to start with $9$ scenario" is also consistent with $\sum \log p/p^s = 1/(s-1) + O(1)$ as $s \to 1^{+}$, so that is also not enough to prove PNT.
Well, one can always say that the PNT is equivalent to
$$\sum_{p \leq x}\frac{1}{p} = \log \log x + M + o\left(\frac{1}{\log x}\right),\tag{$\ast$}$$
because both results are true (with better error terms). This is of course not what is meant by the Wikipedia page. Instead, the idea is that the equivalence PNT$\,\Leftrightarrow(\ast)$ can be established in a simpler way than either PNT or $(\ast)$. On the other hand, "simpler" is a subjective word, e.g. I usually find Tauberian arguments tricky.
At any rate, the first three exercises in Section 8.1.1 of Montgomery-Vaughan: Multiplicative number theory I address this question. For example, the PNT easily implies the relation $\psi(x)\sim x$ (logarithmic weights!), which then implies rather nontrivially (using Theorem 8.1 = Axer's theorem) that
$$\sum_{p\leq x}\frac{\log p}{p}=\log x+C+o(1)$$
for some constant $C$. From here, $(\ast)$ follows easily by partial summation.
Best Answer
This lies beyond Mertens, in the sense that this variant actually implies the Prime Number Theorem, as will be explained below, while Mertens' theorem is weaker than the PNT.
I sketch below a complex analytic proof of the variant, due to Landau.
Let $\Lambda$ be the von Mangoldt function, defined as $\Lambda(p^k)=\log p$ if $p$ is a prime and $k \ge 1$, and $\Lambda(n)=0$ otherwise. Note that your sum $$S_1(n):=\sum_{p \le n} \frac{\log p}{p-1}$$ is very close to $$S_2(n):= \sum_{m \le n} \frac{\Lambda(m)}{m}.$$ Indeed, $$S_2(n)-S_1(n) = \sum_{p \le n} \log p \sum_{1 \le k \le \log_p n} p^{-k}- \sum_{p \le n} \log p\sum_{k \ge 1} p^{-k} = \sum_{p \le n} \log p \sum_{k >\log _p n} p^{-k}.$$ The inner $k$-sum is always $\ll \min\{1/n, 1/p^2\}$, so that this difference is
$$\ll \sum_{p \le n} \log p \min\{1/n, 1/p^2\}\ll \sum_{m \le \sqrt{n}} \frac{\log m}{n} + \sum_{\sqrt{n}<m \le n} \frac{\log m}{m^2} \ll \frac{\log n}{\sqrt{n}}=o(1).$$ We see it suffices to estimate $S_2(n)$.
The reason that I introduced $\Lambda$ is that the PNT is usually proved with $\Lambda$. The Dirichlert series of $\Lambda$ is $-\zeta'(s)/\zeta(s)$, so Perron's formula gives $$S_2(n) = \frac{1}{2\pi i}\int_{(1)}-\frac{\zeta'}{\zeta}(s+1) n^s \frac{ds}{s}+O((\log n)/n).$$ The main term $\log n - \gamma$ arises as the residue of the double pole of the integrand at $s=0$. Indeed, near $s=0$ we have the Laurent expansion $$\begin{align}-\frac{\zeta'}{\zeta}(s+1)n^s/s&= s^{-2} (1-\gamma s + O(s^2)) (1+s \log n + O(s^2 \log^2 n)) \\ &= s^{-2}(1+(\log n-\gamma)s+O(s^2)). \end{align}$$ Here we made use of $$(\star)\, -\frac{\zeta'}{\zeta}(s) = \frac{1}{s-1} -\gamma + O(s-1)$$ near $s=1$. To justify the above heuristic one needs, as in the classical proof of the PNT, to 1) apply a truncated version of Perron's formula and 2) shift the contour to the left using a zero-free region. This is why I believe Hadamard and de la Vallée Poussin had possibly known this variant.
Landau, in his book "Handbuch der Lehre von der Verteilung der Primzahlen" (1911) used Perron's formula to work out an explicit formula for $\sum_{m \le n} \Lambda(m)/m^s$ for general $s$. For $s=1$ it corresponds to $S_2(n)$. A reference with proof is Lemma 4.1 in this preprint (specialize to $s=1$). In particular, letting $$\Delta(n):=S_2(n)-( \log n- \gamma)$$ then, assuming $n$ is a positive integer, $$\Delta(n)=\frac{\Lambda(n)}{2n}- \sum_{\rho} \frac{n^{\rho-1}}{\rho-1} \ll \exp(-c(\log n)^{3/5}(\log \log n)^{-1/5})$$ for some $c>0$. Here the sum is over the zeros of $\zeta$, and the inequality follows from the Vinogradov--Korobov zero-free region. Under RH, $\Delta(n)\ll n^{-1/2}\log^2 n$.
The Laurent expansion $(\star)$ is not mysterious; it's equivalent to the Laurent expansion $$(\star \star)\, \zeta(s) = \frac{1}{s-1} + \gamma +O(s-1)$$ at the same point, which in turn is equivalent to the Harmonic sum $\sum_{m \le n} 1/n$ being equal to $\log n + \gamma + O(1/n)$. See Corollary 1.16 in Montgomery and Vaughan's book, "Multiplicative Number Theory I" for a rigorous derivation of $(\star \star)$.
Here is a more accessible proof. Given an arithmetic function $f=\alpha*\beta$ we have $$\sum_{n \le x} f(n)=\sum_{n \le x} \alpha(n) \sum_{m\le x/n} \beta(m).$$ Applying this with $\log = \mathbf{1}*\Lambda$ where $\log$ is the natural logarithm and $\mathbf{1}$ is the constant function taking the value $1$, we find $$\sum_{n \le x} \log n = \sum_{n \le x} \Lambda(n)\lfloor x/n \rfloor.$$ This identity goes back to Chebyshev. Dividing both sides by $x$ we see $$\sum_{n \le x} \frac{\Lambda(n)}{n} =x^{-1}\sum_{n \le x} \log n +x^{-1} \sum_{n \le x} \Lambda(n) \{x/n\}$$ where $\{t\} \in [0,1)$ is the fractional part of $t$. A very weak version of Stirling's approximation tells us $$x^{-1}\sum_{n \le x} \log n=\log x - 1 + O\left( \frac{\log x}{x}\right).$$ To establish $S_2(x) =\log x- \gamma+o(1)$ it remains to show $$x^{-1} \sum_{n \le x} \Lambda(n) \{x/n\} = 1-\gamma + o(1).$$ By the Prime Number Theorem, $\Lambda =1$ 'on average', so we would expect the sum to be $$=x^{-1} \sum_{n \le x} \{x/n\}+o(1)$$ which indeed tends to $1-\gamma$ (see Exercise 2.1.1.1, page 39 in Montgomery--Vaughan, which is attributed to de la Vallée Poussin; this is a restatement of the asymptotic $\sum_{n \le x} d(n) = x\log x + (2\gamma-1)x+o(x)$ which was known to Dirichlet). To make this formal we just need to show that the difference $$\sum_{n \le x} (\Lambda(n)-1) \{x/n\}$$ is $o(x)$ which can be done via the Prime Number Theorem, see Exercise 8.1.1.1, page 248, in Montgomery--Vaughan. In the same exercise they use Axer's theorem to show that $\sum_{n\le x}\Lambda(n) \sim x$ (PNT) is in fact equivalent to $S_2(n) =\log n- \gamma+o(1)$.