How can you prove or disprove that half of all natural numbers can be expressed as the sum of three squares minus the product of their roots

conjecturesdiophantine equationselementary-number-theorynumber theory

I do not have much formal mathematical education beyond high school, but I do like watching math YouTube videos. I recently read a comment on a video asking if anyone could prove that there are no positive integer solutions to the equation a2 + b2 + c2 – abc = 1. It took me a while, but I found a proof, and I then used a similar method to prove a more general claim.

For any given positive integer n, there exists a set of three positive integers a, b, and c satisfying the equation a2 + b2 + c2 – abc = n if and only if there exists such a set where a2 + b2 ≤ n. For example, if there are no positive integer solutions to the equation a2 + b2 + c2 – abc = 6 where a2 + b2 ≤ 6, then there are no positive integer solutions to that equation at all.

My proof is too long to paste into this post, so here is a link to a PDF of it:
https://www.dropbox.com/scl/fi/9wsm5b4zsz9xpc0r7p0rk/naturalnumberproblem.pdf?rlkey=0gwqktvfu9215fskihlt2fvu3&dl=0

Since there are only finitely many pairs of positive integers who squares sum to a value less than or equal to a given number, you only need to check finitely many cases to determine whether there is a solution for a given n. For example, for n = 6, you only need to check {a = 1, b = 1} and {a = 1, b = 2}. Solving for c shows there are no positive integer solutions.

After discovering this, I wrote a program to that checked whether there were solutions for each n from 1 to 100. I was hoping to find a pattern in the numbers that have or don't have solutions, but I didn't find much. However, I did notice that about half of the numbers had solutions. That intrigued me, so I increased the range of my search to see if this percentage held.

Here are the results: From n = 100 to 999, the percentage of numbers with solutions stays between 49.3% and 52.5%. From n = 100 to 9999, the percentage of numbers with solutions stays between 49.8% and 50.4%. From n = 10000 to 99999, the percentage of numbers with solution stays between 50.1% and 50.7%

My conjecture is that this percentage stays around 50% and that the deviations get smaller and smaller over time. I assume there is some way to state this more formally using limits. Does anyone know if what I found has already been discovered and if my conjecture has already proven or disproven? What methods would you even use to do this?

As I said, I don't have much formal mathematical education. The math used in the proof linked above is fairly simple. Nothing more 'advanced' than modular arithmetic and proofs by contradiction. My intuition is that proving my conjecture would rely on math I've never even heard of, but maybe there is some (relatively) simple proof out there.

I can prove that ~41.6% of all numbers will not have solutions. (I am speaking loosely. I know this needs to be stated more formally with limits.) This is because there are no solutions to the equation when n is congruent to 3 mod 4, 3 mod 9, or 6 mod 9. This was verified by a computer trying every possible solution working from mod 2 to mod 800.

Let f(n) count the numbers less than or equal to n that are congruent to 3 mod 4, 3 mod 6, or 3 mod 9. There are no numbers that are congruent to both 3 mod 9 and 6 mod 9. The only numbers that are congruent to both 3 mod 4 and 3 mod 9 are those that are congruent to 3 mod 36, and the only one congruent to both 3 mod 4 and 6 mod 9 are those that congruent to 15 mod 36. Therefore, the limit of f(n)/n as n approaches infinity is equal to 1/4 + 1/9 + 1/9 – 1/36 – 1/36 = 5/12 = ~41.6%

One more thing I know (proof in that PDF) is that for all n except n = 2, there are either no positive integer solutions to the equation a2 + b2 + c2 – abc = n or there are infinitely many such solutions. For n = 2, there is only one solution {a = b = c = 1}. I doubt that is relevant to the conjecture, but it does not hurt to mention it.

One last thing: I'm tagging this as a number theory question and as a question about Diophantine equations. Let me know if that is incorrect.

EDIT #1:
I want to add that I looked up the first dozen n with solutions in the OEIS and didn't find anything.

Here's a list of all n with solutions up to 100 in case it helps: 2, 4, 5, 8, 10, 13, 14, 17, 18, 20, 22, 25, 26, 28, 29, 32, 34, 37, 38, 40, 41, 44, 45, 49, 50, 52, 53, 54, 58, 61, 62, 64, 65, 68, 70, 72, 73, 74, 76, 77, 80, 82, 85, 88, 89, 90, 92, 94, 97, 98, 100

EDIT #2:
My program just finished checking up to n = 1,000,000. From n = 100,000 to 1,000,000 the percentage of numbers with solutions stays between 50.7% and 51.6%. It's getting further from 50%, so I'm starting to doubt my conjecture is true.

If anyone wants to run the program themselves, here's a link to txt version of it:

https://www.dropbox.com/scl/fi/kb2p277sa6d29ieeedigk/mathprogram.txt?rlkey=sr5su9zny13y75qg0b5k1v0g4&dl=0

It's written in Python. I don't have any formal programming education either. Just started teaching myself Python a few weeks ago, so I apologize if it's confusing or inefficient code. Also, the program will not make much sense unless you've read the section in the linked proof called "A Note on Searching For Satisfactory Triplets."

EDIT 3: There was a typo in my post. I said there are no solutions when N is 3 mod 6. I meant to write 6 mod 9. Sorry!

EDIT 4: There were also a couple typos in my PDF where I used < when I meant to write ≤, but I have corrected them. (Unfortunately, this somehow created some new typos. It must be an issue with the conversion from .doc to .pdf but I don't think it effects the clarity much.)

EDIT 5: Some errors in the proof were fixed, and a new link has been created. This one should be viewable by anyone.

EDIT 6: Refined the approximation of how many numbers (less than N for sufficiently large N) don't have solutions from 40% to 41.6%.

Best Answer

This is not a complete answer.

Let $C$ be the set of $n$ you are interested in, i.e. those expressible as $x^2 + y^2 + z^2 - xyz$ for positive integers $x,y,z$.

Main Observation: $C$ is A351723 on OEIS, minus possibly some square numbers(for example $9$).

A351723 is defined as integers of the form $x^2 + y^2 + z^2 + xyz$, where $x, y, z \geq 0$.

Claim: Any positive integers $n \in C$ lies in A351723.

Proof: Suppose $n = x^2 + y^2 + z^2 - xyz$ where $x,y,z$ are positive integers. You have correctly shown that there is such a solution with $y^2 + z^2 \leq n$. Therefore, we can write $$n = x_1^2 + y^2 + z^2 + x_1 yz$$ with $x_1 = x - yz \geq 0$.

Similarly, one can prove that

Claim: Any non-square positive integer $n$ in A351723 lies in $C$.

Proof: If $n = x^2 + y^2 + z^2 + xyz$ and $n$ is not a square, then at least two of $x,y,z$ are non-zero. Say $y, z \neq 0$. Then we can write $$n = x_2^2 + y^2 + z^2 - x_2 yz$$ where $x_2 = x + yz$. Observe that $x_2, y, z$ are all positive integers.

In the comment section of A351723, Zhiwei Sun has conjectured that

Conjecture: If $a_n$ is the $n$-th term of A351723, then $a_n < 2n$.

which, if true, would imply your conjecture that the density of $C$ is at least $50\%$. So in a sense, you have recovered a conjecture made by a mathematician less than a year ago.