As a student of math and physics, this has been one of the biggest annoyances for me; I'll give my two cents on the matter. Throughout my entire answer, whenever I use the term "function", it will always mean in the usual math sense (a rule with a certain domain and codomain blablabla).
I generally find two ways in which people use the phrase "... is a function of ..." The first is as you say: "$f$ is a function of $x$" simply means that for the remainder of the discussion, we shall agree to denote the input of the function $f$ by the letter $x$. This is just a notational choice as you say, so there's no real math going on. We just make this choice of notation to in a sense "standardize everything". Of course, we usually allow for variants on the letter $x$. So, we may write things like $f(x), f(x_0), f(x_1), f(x'), f(\tilde{x}), f(\bar{x})$ etc. The way to interpret this is as usual: this is just the result obtained by evaluating the function $f$ on a specific element of its domain.
Also, you're right that the input label is completely arbitrary, so we can say $f(t), f(y), f(\ddot{\smile})$ whatever else we like. But again, often times it might just be convenient to use certain letters for certain purposes (this can allow for easier reading, and also reduce notational conflicts); and as much as possible it is a good idea to conform to the widely used notation, because at the end of the day, math is about communicating ideas, and one must find a balance between absolute precision and rigour and clarity/flow of thought.
btw as a side remark, I think I am a very very very nitpicky individual regarding issues like: $f$ vs $f(x)$ for a function, I'm also always careful to use my quantifiers properly etc. However, there have been a few textbooks I glossed over, which are also extremely picky and explicit and precise about everything; but while what they wrote was $100 \%$ correct, it was difficult to read (I had to pause often etc). This is as opposed to some other books/papers which leave certain issues implicit, but convey ideas more clearly. This is what I meant above regarding balance between precision and flow of thought.
Now, back to the issue at hand. In your third and fourth paragraphs, I think you have made a couple of true statements, but you're missing the point. (one of) the job(s) of any scientist is to quantitatively describe and explain observations made in real life. For example, you introduced the example of the amount of wax burnt, $w$. If all you wish to do is study properties of functions which map $\Bbb{R} \to \Bbb{R}$ (or subsets thereof), then there is clearly no point in calling $w$ the wax burnt or whatever.
But given that you have $w$ as the amount of wax burnt, the most naive model for describing how this changes is to assume that the flame which is burning the wax is kept constant and all other variables are kept constant etc. Then, clearly the amount of wax burnt will only depend on the time elapsed. From the moment you start your measurement/experiment process, at each time $t$, there will be a certain amount of wax burnt off, $w(t)$. In other words, we have a function $w: [0, \tau] \to \Bbb{R}$, where the physical interpretation is that for each $t \in [0, \tau]$, $w(t)$ is the amount of wax burnt off $t$ units of time after starting the process. Let's for the sake of definiteness say that $w(t) = t^3$ (with the above domain and codomain).
"Sure, $w$ only has the interpretation we think it does (cumulative amount of wax burnt) when we provide a (real number in the domain of definition, which we interpret as) time as its argument"
True.
"...Sure, we can't really interpret $w$ in the same way if I did this, but there is nothing in the definition of w which stops me from doing this."
Also true.
But here's where you're missing the point. If you didn't want to give a physical interpretation of what elements in the domain and target space of $w$ mean, why would you even talk about the example of burning wax? Why not just tell me the following:
Fix a number $\tau > 0$, and define $w: [0, \tau] \to \Bbb{R}$ by $w(t) = t^3$.
This is a perfectly self-contained mathematical statement. And now, I can tell you a bunch of properties of $w$. Such as:
- $w$ is an increasing function
- For all $t \in [0, \tau]$, $w'(t) = 3t^2$ (derivatives at end points of course are interpreted as one-sided limits)
- $w$ has exactly one root (of multiplicity $3$) on this interval of definition.
(and many more other properties). So, if you want to completely forget about the physical context, and just focus on the function and its properties, then of course you can do so. Sometimes, such an abstraction is very useful as it removes any "clutter".
However, I really don't think it is (always) a good idea to completely disconnect mathematical ideas from their physical origins/interpretations. And the reason that in the sciences people often assign such interpretations is because their purpose is to use the powerful tool of mathematics to quantitatively model an actual physical observation.
So, while you have made a few technically true statements in your third and fourth paragraphs, I believe you've missed the point of why people assign physical meaning to certain quantities.
For your fifth paragraph however, I agree with the sentiment you're describing, and questions like this have tortured me. You're right that $w$ is a function of a single variable (where in this physical context, we interpret the arguments as time). If you now ask me how does $w$ change in relation to the distance I have started to walk, then I completely agree that there is no relation whatsoever.
But what is really going on is a terrible, annoying, confusing abuse of notation, where we use the same letter $w$ to have two differnent meanings. Physicists love such abuse of notation, and this has confused me for so long (and it still does from time to time). Of course, the intuitive idea of why the amount of wax burnt should depend on distance is clear: the further I walk, the more time has passed, and hence the more max has burnt. So, this is really a two step process.
To formalize this, we need to introduce a second function $\gamma$ (between certain subsets of $\Bbb{R}$), where the interpretation is that $\gamma(x)$ is the time taken to walk a distance $x$. Then when we (by abuse of language) say $w$ is a function of distance, what we really mean is that
The composite function $w \circ \gamma$ has the physical interpretation that for each $x \in \text{domain}(\gamma)$, $(w \circ \gamma)(x)$ is the amount of wax burnt when I walk a distance $x$.
Very often, this composition is not made explicit. In the Leibniz chain rule notation
\begin{align}
\dfrac{dw}{dx} &= \dfrac{dw}{dt} \dfrac{dt}{dx}
\end{align}
Where on the LHS $w$ is miraculously a function of distance, even though on the LHS (and initially) $w$ was a function of time, what is really going on is that the $w$ on the LHS is a complete abuse of notation. And of course, the precise way of writing it is $(w \circ \gamma)'(x) = w'(\gamma(x)) \cdot \gamma'(x)$.
In general, whenever you initially have a function $f$ "as a function of $x$" and then suddenly it becomes a "function of $t$", what is really meant is that we are given two functions $f$ and $\gamma$; and when we say "consider $f$ as a function of $x$", we really mean to just consider the function $f$, but when we say "consider $f$ as a function of time", we really mean to consider the (completely different) function $f \circ \gamma$.
Summary: if the arugments of a function suddenly change interpretations (eg from time to distance or really anything else) then you immediately know that the author is being sloppy/lazy in explicitly mentioning that there is a hidden composition.
Best Answer
It's worth remembering that $$ f'(x) = \lim_{h \rightarrow 0} \frac{f(x+h) - f(x)}{h} $$ is obtained after simplifying from $$ f'(x) = \lim_{h \rightarrow 0} \frac{f(x+h) - f(x)}{(x+h) - x} \text{.} $$ This is the rate of change of $f(x)$ with respect to the function $x \mapsto x$.
The most straightforward thing to do for the given problem is \begin{align*} \frac{\mathrm{d}f(x)}{\mathrm{d}(x^2)} &= \lim_{h \rightarrow 0} \frac{f(x+h) - f(x)}{(x+h)^2 - (x^2)} \\ &= \lim_{h \rightarrow 0} \frac{(x+h) - (x)}{x^2+2xh+h^2 - x^2} \\ &= \lim_{h \rightarrow 0} \frac{h}{2xh+h^2} \\ &= \frac{1}{2x} \text{.} \end{align*}
If the resulting derivatives exist, one can manipulate as \begin{align*} \frac{\mathrm{d}f(x)}{\mathrm{d}g(x)} &= \lim_{h \rightarrow 0} \frac{\Delta f(x)}{\Delta g(x)} \\ &= \lim_{h \rightarrow 0} \frac{ \frac{\Delta f(x)}{\Delta x} }{ \frac{ \Delta g(x)}{ \Delta x} } \\ &= \frac{ \lim_{h \rightarrow 0}\frac{\Delta f(x)}{\Delta x} }{ \lim_{h \rightarrow 0}\frac{ \Delta g(x)}{ \Delta x} } \\ &= \frac{ f'(x) }{ g'(x) } \text{.} \end{align*}