There are many properties that are equivalent to uniqueness of factorization in $\,\Bbb Z.\:$ Below is a sample off the top of my head (by no means complete). Each provides a slightly different perspective on why uniqueness holds - perspectives that becomes clearer when one sees how these equivalent properties bifurcate in more general integral domains. Below we use the notation $\rm\:(a,b)=1\:$ to mean that $\rm\:a,b\:$ are coprime, i.e. $\rm\:c\mid a,b\:\Rightarrow\:c\mid 1.$
$\rm(1)\ \ \ gcd(a,b)\:$ exists for all $\rm\:a,b\ne 0\ \ $ [GCD domain]
$\rm(2)\ \ \ a\mid BC\:\Rightarrow a=bc,\ b\mid B,\ c\mid C\ \ \, $ [Schreier refinement, Euler's four number theorem]
$\rm(3)\ \ \ a\,\Bbb Z + b\, \Bbb Z\, =\, c\,\Bbb Z,\:$ for some $\rm\,c\quad\ $ [Bezout domain]
$\rm(4)\ \ \ (a,b)=1,\ a\mid bc\:\Rightarrow\: a\mid c\qquad\ \ $ [Euclid's Lemma]
$\rm(5)\ \ \ (a,b)=1,\ \dfrac{a}{b} = \dfrac{c}{d}\:\Rightarrow\: b\mid d\quad\ \ $ [Unique Fractionization]
$\rm(6)\ \ \ (a,b)=1,\ a,b\mid c\:\Rightarrow\: ab\mid c$
$\rm(7)\ \ \ (a,b)=1\:\Rightarrow\: a\,\Bbb Z\cap b\,\Bbb Z\, =\, ab\,\Bbb Z $
$\rm(8)\ \ \ gcd(a,b)\ \ exists\:\Rightarrow\: lcm(a,b)\ \ exists$
$\rm(9)\ \ \ (a,b)=1=(a,c)\:\Rightarrow\: (a,bc)= 1$
$\rm(10)\ $ atoms $\rm\, p\,$ are prime: $\rm\ p\mid ab\:\Rightarrow\: p\mid a\ \ or\ \ p\mid b$
Which of these properties sheds the most intuitive light on why uniqueness of factorization entails? If I had to choose one, I would choose $(2),$ Schreier refinement. If you extend this by induction it implies that any two factorizations of an integer have a common refinement. For example if we have two factorizations $\rm\: a_1 a_2 = n = b_1 b_2 b_3\:$ then Schreier refinement implies that we can build the following refinement matrix, where the column labels are the product of the elements in the column, and the row labels are the products of the elements in the row
$$\begin{array}{c|ccc} &\rm b_1 &\rm b_2 &\rm b_3 \\
\hline
\rm a_1 &\rm c_{1 1} &\rm c_{1 2} &\rm c_{1 3}\\
\rm a_2 &\rm c_{2 1} &\rm c_{2 2} &\rm c_{2 3}\\
\end{array}$$
This implies the following common refinement of the two factorizations
$$\rm a_1 a_2 = (c_{1 1} c_{1 2} c_{1 3}) (c_{2 1} c_{2 2} c_{2 3}) = (c_{1 1} c_{2 1}) (c_{1 2} c_{2 2}) (c_{1 3} c_{2 3}) = b_1 b_2 b_3.$$
This immediately yields the uniqueness of factorizations into primes (atoms). It also works more generally for factorizations into coprime elements, and for factorizations of certain types of algebraic structures (abelian groups, etc).
It seems that you are looking for mathematics that is not too technical. In this regard, I do recommend the textbook 'Comprehensive Mathematics for Computer Scientists' because it contains the type of theory that forms the foundations of mathematics - i.e. sets and logic - without going into numerical analysis. Logic and set theory can be intuitive and fun and is definitely useful for application in other subjects such as computer science and philosophy.
On the other hand, theoretical mathematics that is abstracted from numbers, such as linear algebra (chapters 20 -25), can be difficult to master if you are not dedicated, as there are many complicated concepts to understand. For a student with no math foundation, many of the chapters in this textbook will be fast-paced.
If you are looking to "reintroduce basic maths", then this book is not suitable. A better textbook would be one that is aimed at high-school students or a first-year introductory course. The level of math that you will be at once you have mastered the book will be about the end of a second-year pure maths course at university. You can expect to require 5 hours per week for a year $\approx$ 250 hours in total.
An alternate solution, which is a "self-contained" book and requiring "minimal to no previous mathematical background" is to find a descriptive coffee-table style book on mathematics, such as 50 mathematical ideas you really need to know by Tony Crilly (London: Quercus). In my opinion, the more description that the book gives and the fewer formulae, the easier it is to understand what is going on. This book can easily be read by dipping into and out of different chapters, which each take about half an hour to read and understand (there are 50 chapters).
Best Answer
Thanks to Eu Yu's link, I found this paper by Birkhoff and Vandiver. It was a bit confusing to read so I think I'd better clean up their proof. I simplified their proof of Theorem 5 using cyclotomic polynomials, too. Therefore I needed Theorem 7 in this paper by Yimin Ge and a generalisation of Theorem 4 in the earlier cited presentation of Lola Thompson.
Their Theorem 1 isn't such a big shot, and Theorem 2 and 3 are nowadays known as special cases of the Lifting the Exponent Lemma. Hence not much to say about these. Theorem 4 seemed to be the main matter, and it's the only part of their paper I kept when constructing my own proof. I'll give a much shorter alternative for Theorem 5.
Let's fix two coprime positive integers $a$ and $b$ with $a>b$.
It suffices to prove there is a prime divisor of $a^n-b^n$ that does not divide $a^k-b^k$ for all positive divisors $k\mid n$ with $k<n$: if $p\mid a^n-b^n$ and $c$ is an inverse of $b$ modulo $p$, then the smallest $k>0$ for which $(ac)^k\equiv1$ satisfies $k\mid n$.
Now we define $z_n=a^n-b^n$ and
$$\Psi_n=\prod_{d\mid n}{z_{\frac nd}}^{\mu(d)}.\qquad(1)$$
Anyone who has been introduced to cyclotomic polynomials may notice that
$$\Psi_n=b^{\varphi(n)}\Phi\left(\frac ab\right).\qquad(2)$$
It follows that $\Psi$ inherits many properties of cyclotomic polynomials. The ones I will use can be found in Yimin Ge's paper for example. I won't use properties for $\Psi$ if it's not clear how they are inherited. (Cyclotomic polynomials are never explicitly mentioned in the paper, I guess they weren't so well known at the time.)
The following lemma will be useful more then once:
Lemma. If $p$ is prime, $n=p^\alpha q$ an integer such that $p\nmid q$ and $p\mid\Phi_n(a)$ for some integer $a$, then $O_p(a)=q$. (The $O$ is just a lazy notation for the multiplicative order.)
Proof. If $O_p(a)=k$ then $k\mid q$ because from $p\mid\Phi_n(a)$ we certainly have $p\mid a^n-1\equiv a^q-1$. If $k<q$ then there is a divisor $d\mid k$ for which $p\mid\Phi_d(a)$. As $d\mid q$ and $d<q$ this means the polynomial $x^q-1$ has a double root, $a$, modulo $p$, hence $p\mid q$ (see Lemma 6 from Yimin Ge) which is impossible. $\square$
The lemma can be modified to fit for $\Psi$ too, but I won't do that.
By a multiplicative variant on Möbius-inversion, we have $$z_n=\prod_{d\mid n}\Psi_d.\qquad(3)$$
If $z_n={p_1}^{a_1}\cdots{p_r}^{a_r}$ where $p_{s_1},\ldots,p_{s_t}$ are the primitive prime divisors of $z_n$, we set
$$P_n={p_{s_1}}^{a_{s_1}}\cdots{p_{s_t}}^{a_{s_t}}.$$
From $(2)$ we have $\Psi_n\in\mathbb Z$ and from $(1)$ it follows that $P_n\mid\Psi_n$. Set $\Psi_n=\lambda_nP_n$. We shall prove that $P_n>1$ in all cases Zsigmondy's theorem does not exclude.
From $(3)$ it follows that $\Psi_n\mid\frac{z_n}{z_d}$ for every divisor $d\mid n$ with $d<n$.
Let $p$ be a prime divisor of $\Psi_n$. If $p\nmid P_n$ then $p$ is not primitive, suppose $p\mid z_d$. If $p\nmid n$ then by the Lifting The Exponent Lemma, $v_p(z_n)=v_p(z_d)$ so $p\nmid\frac{z_n}{z_{d}}$, contradiction. Hence $\text{rad}(\lambda_n)\mid n$.
Furthermore, note that $\gcd(\lambda_n,P_n)=1$. Otherwise there would be a prime $p$ for which $p\mid n=pr$ and $p\mid a^{pr}-b^{pr}\equiv a^r-b^r$ contradicting the primitivity of $p$.
Suppose $\lambda_n>1$. If $p$ is a prime divisor of $\lambda_n$ with $p^\alpha\|n$ and $n=p^\alpha q$ then
$$p\mid\Psi_n\mid\Psi_q(a^{p^\alpha},b^{p^\alpha})\equiv\Psi_q\pmod p,$$
(see corollary 4 from Yimin Ge) where more generally we denote
$$\Psi_n(x,y)=y^{\varphi(n)}\Phi_n\left(\frac xy\right).$$
If $p$ is a prime divisor of $\lambda_n$, so $p\mid\Psi_q$, then either $p\mid q$, which is impossible, or $p\equiv1\pmod q$. (see Theorem 6 from Yimin Ge) So $p>q=\frac n{p^\alpha}$. If $r$ is another prime divisor of $n$, then $r\mid q$, so $r<q<p$. This means $p$ is uniquely determined as the largest prime divisor of $n$.
Therefore, set $\lambda_n=p^\beta$. We'll prove that $\beta=1$.
Let $d\mid n$ such that $p\mid z_d$. Let $c$ be an inverse of $b$ modulo $p$, then $p\mid\Psi_n$ and thus $p\mid\Phi(ac)$, so by the lemma, $O_p(ac)=q$. So certainly we should have $q\mid d$.
Now from $(1)$ we have $\beta=v_p(\Psi_n)=v_p(z_n)-v_p(z_{\frac np})=1$, as desired.
To conclude $P_n>1$ there are three cases:
If $\lambda_n=1$, then $P_n=\Psi_n\geq(a-b)^{\varphi(n)}\geq1$. The inequality is strict unless $n=2$ and $a-b=1$, but then Zsigmondy's theorem is trivially true.
If $\lambda_n=p$ and $a-b>1$, then $P_n=\frac1p\Psi_n\geq\frac1p(a-b)^{\varphi(n)}\geq\frac{2^{p-1}}p\geq1$. Again the inequality is strict unless $a-b=2$ and $n=2$. In that case, $a+b$ contains a primitive prime divisor unless $a+b$ is a power of $2$, which was predicted by the theorem.
The case $\lambda_n=p$ and $a-b=1$ is a little more complicated. (Here I basically prove a generalisation of L. Thompson's "Key Lemma".)
Suppose the inequality is not strict, so $\Psi_n=p$. This will eventually give us the only counterexample that's left, being $n=6$, $a=2$. From $p\mid z_n$ it follows that $p$ is odd. Let, once more, $n=p^\alpha q$ and $c$ an inverse of $b$ modulo $p$. As $p\mid\Psi_n$ we have $p\mid\Phi_n(ac)$, so $O_p(ac)=q$ by the lemma.
If $\alpha>1$, then $p=\Psi_n=\Psi_{pq}(a^{p^{\alpha-1}},b^{p^{\alpha-1}})$, but
$$\Psi_{pq}(a^{p^{\alpha-1}},b^{p^{\alpha-1}})\geq(a^{p^{\alpha-1}}-b^{p^{\alpha-1}})^{\varphi(pq)}\geq a^p-b^p=\sum_{k=0}^{p-1}{p\choose k}b^k>p,$$
because $p>2$, contradiction. Hence $n=pq$. Now we have
$$p=\Psi_n=\frac{\Psi_q(a^p,b^p)}{\Psi_q}\geq\frac{(a^p-b^p)^{\varphi(q)}}{(a+b)^{\varphi(q)}}\geq\frac{a^p-b^p}{a+b}=\frac1{2b+1}\sum_{k=0}^{p-1}{p\choose k}b^k>\frac b{2b+1}\sum_{k=1}^{p-1}{p\choose k}.$$
Since $\frac b{2b+1}\geq\frac13$ we obtain $3p>2^p-2$.
This is impossible when $p>3$, so $p=3$ and $q\mid 2$ because $q=O_p(ac)\mid p-1$.
Hence $n=3$ or $n=6$. If $n=3$ the theorem is obviously true because $3$ is prime and $a-b=1$.
The case $n=6$ remains, and indeed Zsigmondy fails here. From $a=b+1$ and $3=\Psi_6=a^2-ab+b^2$ we easily deduce that $a=2$ and $b=1$ is the only counterexample. $\square$
The proof for the case $a^n+b^n$ can be deduced from the case $a^n-b^n$.
For any positive integer $n>1$ for which $2n$ does not give an exception on Zsigmondy's theorem, $a^{2n}-b^{2n}$ has a primitive prime divisor $p$, dividing either $a^n-b^n$ or $a^n+b^n$. However, $p$ can't divide $a^n-b^n$ since then $p$ wouldn't be primitive. Thus we have $p\mid a^n+b^n$ and $p\nmid a^{2k}-b^{2k}$ for all $k<n$. This implies $p\nmid a^k+b^k$ for all $k<n$, hence the theorem. $\square$
Note that the exception $2^6-1^6$ is reflected in $2^3+1^3$. The case $n=2$ and $a+b$ a power of $2$ disappears because we only consider $n>1$ here.