"Calculus of variations" is the correct buzzword, you may also look for "analysis in infinite dimensions" or, more generally, "nonlinear functional analysis".
Your question can be made mathematically precise, one of the best (but somewhat sophisticated) sources is still:
- Richard S. Hamilton: The Inverse Function Theorem of Nash and Moser (Bulletin (New Series) of the American Mathematical Society Volume 7, Number 1, July 1982)
First, the functions that you have to differentiate are "functions of functions", that is the variables are members of function spaces. The paper I cited treats calculus on such function spaces that are "Fréchet spaces", which is sufficiently general for many applications.
To make your question precise, we'll have to add an assumption about the function space we are talking about, but I'll leave that implicit, i.e. to you :-)
(Hint: a good choice would be the Schwartz space of smooth rapidly decreasing functions, see Wikipedia.)
Your first "function of functions" is e.g. the mapping
$$
p \mapsto \int_{\mathbb{R}^2} p(x, y) d x d y
$$
where I assume that the integral is with respect to the Lesbegue measure and our function space is such that this integral exists and is finite. Let's call this mapping $F$.
Now, the definition of a directional derivative is just the same as in finite dimensions, let $t$ be a real parameter and $q$ be another function, then
$$
DF(p, q) := \lim_{t \to 0} \frac{1}{t} (F(p + t q) - F(p))
$$
which is the directional derivative (which is synonymous to "partial derivative") of $F$ (at p) into the direction $q$. If this limit exists for all functions $q$ (and in a neighborhood of p), and is jointly continuous in the variables $p, q$, then we say that $F$ is continuously differentiable (at p). While the function $F$ that we differentiated was a function of "one variable", the derivative $DF$ is a function of two variables, i.e. on the product space.
Ok, I think you'll be able to calculate $DF(p, q)$ for both the examples you cited, if not, come again :-)
Edit: Addendum, a simple example, let's say that $S(\mathbb{R})$ is the space of Schwartz functions on $\mathbb{R}$ and define a function $F$ via:
$$
F : S(\mathbb{R}) \to S(\mathbb{R})
$$
$$
f \mapsto f' := \frac{d}{d x} f
$$
Now let's see what $F(p ,q)$ may be, if it exists. Since we have
$$
F(p + t q) = \frac{d}{d x} (p + t q) = p' + t q'
$$
because the derivative is a linear operator (hint hint!), we get
$$
DF(p, q) := \lim_{t \to 0} \frac{1}{t} (F(p + t q) - F(p)) = \lim_{t \to 0} \frac{1}{t} (p' + t q' - p') = \lim_{t \to 0} \frac{1}{t} (t q') = q'
$$
Ergo, the limit exists and is independent from the first argument and linear in the second.
But we can do a nonlinear example, too, define:
$$
F : S(\mathbb{R}) \to S(\mathbb{R})
$$
$$
f \mapsto f^2
$$
Then we get
$$
DF(p, q) := \lim_{t \to 0} \frac{1}{t} (F(p + t q) - F(p)) = \lim_{t \to 0} \frac{1}{t} (p^2 + 2 t p q + t^2 q^2 - p^2) = 2 p q
$$
As you can see, the calculus in infinite dimensions is a little bit different from the one in finite dimensions :-)
Your calculation is correct, but I'm hesitant to call it a "valid technique." Let me explain a bit:
In the context you're working in, we have a function $f(u)$, which is a function of a single variable, and also $u(x,y) = x - y$ is a function of two variables. In this setup, the chain rule reads as follows:
$$\frac{\partial f}{\partial x} = \frac{df}{du}\frac{\partial u}{\partial x}$$
$$\frac{\partial f}{\partial y} = \frac{df}{du}\frac{\partial u}{\partial y}$$
Since $\frac{\partial u}{\partial y} = -1$, we can conclude that $\frac{\partial f}{\partial y} = -1\cdot \frac{df}{du}$, and therefore that $$\frac{df}{du} = -1\cdot \frac{\partial f}{\partial y},$$
which, in your (somewhat non-standard) notation reads $\frac{\partial f(x-y)}{\partial (x-y)} = -1\cdot \frac{\partial f(x-y)}{\partial y}$. So, this equation is true, yes.
The reason I hesitate to call your method a "valid technique" is because one usually cannot manipulate the symbols $\partial x$ and $\partial y$ as independent entities.
For instance, if $f$ were instead a function of two variables, say $f(u,v)$, where both $u$ and $v$ were themselves functions of two variables (say $u = u(x,y)$ and $v = v(x,y)$), then the chain rule would read
$$\frac{\partial f}{\partial x} = \frac{\partial f}{\partial u}\frac{\partial u}{\partial x} + \frac{\partial f}{\partial v}\frac{\partial v}{\partial x}$$
$$\frac{\partial f}{\partial y} = \frac{\partial f}{\partial u}\frac{\partial u}{\partial y} + \frac{\partial f}{\partial v}\frac{\partial v}{\partial y}.$$
If you'll notice, we really can't interpret the $\partial u$ and $\partial v$ signs as canceling without getting false identities like $\frac{\partial f}{\partial x} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial x}$.
Amusing Example: Just to really drive the point home, consider the ideal gas law (from chemistry) $PV = nRT$, where $n$ and $R$ are constants. We can consider $P$, $V$, and $T$ as functions
$$P = P(V,T) = nR\frac{T}{V}$$
$$V = V(P,T) = nR \frac{T}{P}$$
$$T = T(P,V) = \frac{1}{nR}PV.$$
One can then check that, in fact:
$$\frac{\partial P}{\partial V} \frac{\partial V}{\partial T}\frac{\partial T}{\partial P} = -1.$$
So much for canceling.
Best Answer
Fix $x$ and let $\gamma(x)$ be some path from $0$ to $x$. Then $f(x) = \int_{\gamma(x)} \nabla f \cdot dr.$ The functional derivative of this w.r.t. $\nabla f$ is defined as the linear functional (often a distribution) $\delta u$ given by $$\langle u, \phi \rangle = \left. \frac{d}{d\lambda} \int_{\gamma(x)} \nabla (f+\lambda\phi) \cdot dr \right|_{\lambda=0}$$ Now, the right hand side equals $$\left. \int_{\gamma(x)} \frac{\partial}{\partial\lambda} \nabla (f+\lambda\phi) \cdot dr \right|_{\lambda=0} = \left. \int_{\gamma(x)} \nabla \phi \cdot dr \right|_{\lambda=0} = \int_{\gamma(x)} \nabla \phi \cdot dr $$ Thus, $$\langle u, \phi \rangle = \int_{\gamma(x)} \nabla \phi \cdot dr = \phi(x) - \phi(0) = \langle \delta(t-x) - \delta(t), \phi(t) \rangle$$ so the functional derivative is $$\frac{\partial f(x)}{\partial \nabla f(t)} = \delta(t-x) - \delta(t)$$ Luckily this result doesn't depend on the choice of $\gamma(x).$