Your example makes me think of graphs.
Imagine some nice, helpful fellow came along, and made a big graph of every math concept ever, where each concept is one node and related concepts are connected by edges. Now you can take a copy of this graph, and color every node green based on whether you "know" that concept (unknowns can be grey).
How to define "know"? In this case, when somebody mentions that concept while talking about something, do you immediately feel confused and get the urge to look the concept up? If no, then you know it (funnily enough, you may be deluding yourself into thinking you know something that you completely misunderstand, and it would be classed as "knowing" based on this rule - but that's fine and I'll explain why in a bit). For purposes of determining whether you "know" it, try to assume that the particular thing the person is talking about isn't some intricate argument that hinges on obscure details of the concept or bizarre interpretations - it's just mentioned matter-of-factly, as a tangential remark.
When you are studying a topic, you are basically picking one grey node and trying to color it green. But you may discover that to do this, you must color some adjacent grey nodes first. So the moment you discover a prerequisite node, you go to color it right away, and put your original topic on hold. But this node also has prerequisites, so you put it on hold, and... What you are doing is known as a depth first search. It's natural for it to feel like a rabbit hole - you are trying to go as deep as possible. The hope is that sooner or later you will run into a wall of greens, which is when your long, arduous search will have born fruit, and you will get to feel that unique rush of climbing back up the stack with your little jewel of recursion terminating return value.
Then you get back to coloring your original node and find out about the other prerequisite, so now you can do it all over again.
DFS is suited for some applications, but it is bad for others. If your goal is to color the whole graph (ie. learn all of math), any strategy will have you visit the same number of nodes, so it doesn't matter as much. But if you are not seriously attempting to learn everything right now, DFS is not the best choice.
So, the solution to your problem is straightforward - use a more appropriate search algorithm!
Immediately obvious is breadth-first search. This means, when reading an article (or page, or book chapter), don't rush off to look up every new term as soon as you see it. Circle it or make a note of it on a separate paper, but force yourself to finish your text even if its completely incomprehensible to you without knowing the new term. You will now have a list of prerequisite nodes, and can deal with them in a more organized manner.
Compared to your DFS, this already makes it much easier to avoid straying too far from your original area of interest. It also has another benefit which is not common in actual graph problems: Often in math, and in general, understanding is cooperative. If you have a concept A which has prerequisite concept B and C, you may find that B is very difficult to understand (it leads down a deep rabbit hole), but only if you don't yet know the very easy topic C, which if you do, make B very easy to "get" because you quickly figure out the salient and relevant points (or it may be turn out that knowing either B or C is sufficient to learn A). In this case, you really don't want to have a learning strategy which will not make sure you do C before B!
BFS not only allows you to exploit cooperativities, but it also allows you to manage your time better. After your first pass, let's say you ended up with a list of 30 topics you need to learn first. They won't all be equally hard. Maybe 10 will take you 5 minutes of skimming wikipedia to figure out. Maybe another 10 are so simple, that the first Google Image diagram explains everything. Then there will be 1 or 2 which will take days or even months of work. You don't want to get tripped up on the big ones while you have the small ones to take care of. After all, it may turn out that the big topic is not essential, but the small topic is. If that's the case, you would feel very silly if you tried to tackle the big topic first! But if the small one proves useless, you haven't really lost much energy or time.
Once you're doing BFS, you might as well benefit from the other, very nice and clever twists on it, such as Dijkstra or A*. When you have the list of topics, can you order them by how promising they seem? Chances are you can, and chances are, your intuition will be right. Another thing to do - since ultimately, your aim is to link up with some green nodes, why not try to prioritize topics which seem like they would be getting closer to things you do know? The beauty of A* is that these heuristics don't even have to be very correct - even "wrong" or "unrealistic" heuristics may end up making your search faster.
One of my teachers always told me "don't know definitions, don't know math." At the time I was pretty annoyed, but he was completely right. The only way to learn math is to have the fundamentals down cold. This involves both a rigorous side, (memorizing them is a good start) and an intuitive side. So at an entry level, I strongly recommend spending a long time with the definitions. Theorems are nice and can help you understand the relationship between the definitions. But as far as Intuition goes, don't dive into the mechanics of the theorems too early.
Some big ones from calculus are limit, Taylor series, integral, derivative/differentiable, open/closed, even/odd, and continuous. If you know those you can probably talk to anyone about calculus.
The only way to build your intuitive understanding is to fail. Getting it wrong is the first step to getting it not totally wrong. That means trying a lot. Do your homework carefully. Try to ask follow up questions. A good curriculum can help reduce the amount of time it takes, you'll have to be patient no matter what. Do examples. Do hard examples. Do more examples. Do counter examples. Do not just settle for "well, $0$ satisfies the equation so it's probably fine." We've all done that, but it's bad practice.
You know you're on the right track when you can see why a definition was picked the way it was. That is the real heart of intuition for definitions. For example, why should the coefficients for Taylor series look like they are? What properties do we even want from a taylor's series? Well, polynomials are awesome and simple. So let's use polynomials to approximate stuff. Ok... but how can we pick good approximations? It turns out it has something to do with making the $n^{\text{th}}$ derivative have the right value. It's worth understand how that works.
It sounds like you're on the right track. Half the battle is wanting to do it. The other half is work.
Also, this site is a good resource. Learning to ask good questions here will be super helpful for you.
Best Answer
This is not really an answer, but it was getting too long to be a comment.
Mathematics draws much of its power from deep, sometimes mysterious dualities between geometry and algebra, so I do not think there is any way to understand the relationship between geometric intuition and symbol manipulation in general.
Diophantine equations are one of the least geometrically intuitive areas of mathematics, though it is certainly worth mentioning that geometric insights (albeit very difficult ones) are behind some of the deepest results in the field, like Fermat's Last Theorem.
One can sometimes make arguments from intuitive principles that give partial answers. To take your example of $(xy-7)^2 = x^2 + y^2$, one can observe immediately that this is the intersection in the plane of two curves, one of degree $2$ and one of degree $4$, that have no components in common. By Bézout's theorem, there are at most $8$ intersection points (in fact, we can conclude that there are exactly $8$ solutions, if we know how to count them).
This tells us that the integer solutions, in particular, are a finite, possibly empty set. But it doesn't really give a good a priori method for finding them. That requires some symbol pushing.
To some extent, we can gain a lot of intuition for pushing symbols around. But this can be a lifelong process, and I think any mathematician will tell you that there are advantages to being able to manipulate equations without completely understanding what you are doing. There is some truth to the famous quotation that mathematics is about getting used to things, and for myself I can say that diligent practice has been far more valuable than theory when it comes to developing comfort with difficult problems.
The theory and intuition comes eventually, and with specific examples you may get little boosts to your insight here and there from helpful mentors, but there is no singular approach that will allow you to "plan".