i.e. $T(\tau_cf)=Tf$ where
You're misunderstanding translation-invariance here.
A translation-invariant operator $T \colon L^p(\mathbb{R}^d) \to L^q(\mathbb{R}^d)$ is an operator that commutes with translations, i.e. $T \circ \tau_c = \tau_c \circ T$ for all $c \in \mathbb{R}^d$.
Then, with a bit of ho-humming (that can be made precise, see below), when the $x_k$ are spread far enough apart, the $T(\tau_{x_k}g) = \tau_{x_k}Tg$ have the bulk of their weights separated, so
$$\lVert Tf\rVert_q = \lVert \sum \tau_{x_k}Tg\rVert_q \approx \left(\int \sum \left\lvert (Tg)(x-x_k)\right\rvert^q\,dx \right)^{1/q} \approx \left(\sum \int \lvert(Tg)(x-x_k)\rvert^q\,dx \right)^{1/q} \approx N^{1/q} \lVert Tg\rVert_q$$
For $f$ itself, with the assumption of compact support there is no problem seeing $\lVert f\rVert_p = N^{1/p}\lVert g\rVert_p$.
To conclude that that implies $q \geqslant p$, one needs $T \neq 0$.
Making the ho-humming precise:
Let
$$\chi_r(x) = \begin{cases}
1,\quad \lVert x\rVert \leqslant r\\
0,\quad \lVert x\rVert > r.
\end{cases}$$
Fix a (smooth) $g \in L^p(\mathbb{R}^d)$ with compact support in $B_R(0)$.
Then, for $f = \sum\limits_{k = 1}^N \tau_{x_k}g$ and $r > 0$, if $\lVert x_i - x_j\rVert \geqslant 2r$ for all $i \neq j$, you have
$$\begin{align}
\lVert Tf\rVert_q &= \left\lVert\left(\sum_{k=1}^N \tau_{x_k}\bigl(\chi_r\cdot(Tg)\bigr)\right) + \left(\sum_{k=1}^N \tau_{x_k}\bigl((1-\chi_r)\cdot(Tg)\bigr)\right) \right\rVert_q\\
&\leqslant \left\lVert\left(\sum_{k=1}^N \tau_{x_k}\bigl(\chi_r\cdot(Tg)\bigr)\right)\right\rVert_q + N\cdot \lVert (1-\chi_r)(Tg)\rVert_q\\
&= \left(\int \sum_{k=1}^N \left\lvert\chi_r(x-x_k)(Tg)(x-x_k)\right\rvert^q\,dx\right)^{1/q} + N\cdot \lVert (1-\chi_r)(Tg)\rVert_q\\
&= \left(N\cdot \lVert \chi_r\cdot (Tg)\rVert^q\right)^{1/q} + N\cdot \lVert (1-\chi_r)(Tg)\rVert_q\\
&= N^{1/q} \lVert \chi_r\cdot (Tg)\rVert_q + N\cdot \lVert (1-\chi_r)(Tg)\rVert_q\\
&\leqslant N^{1/q} \lVert Tg\rVert_q + N\cdot \lVert (1-\chi_r)(Tg)\rVert_q.
\end{align}$$
The first inequality follows from the triangle inequality for $\lVert\cdot\rVert_q$. The next equalities follow from the disjointness of the supports of the $\tau_{x_k}\bigl(\chi_r\cdot(Tg)\bigr)$ and the translation-invariance (note: since the result is a number, and not a function, translation-invariance means $\lambda(\tau_c M) = \lambda(M)$ for all $c$ and measurable $M$ here) of the Lebesgue measure. The final inequality from $\lVert\chi_r\cdot h\rVert_q \leqslant \lVert h\rVert_q$ for all $h$.
Using the triangle inequality in the form $\lVert a + b \rVert \geqslant \lVert a\rVert - \lVert b\rVert$ for the split between the $\chi_r(Tg)$ and $(1-\chi_r)(Tg)$, we obtain
$$\lVert Tf\rVert_q \geqslant N^{1/q} \lVert \chi_r\cdot (Tg)\rVert_q - N\cdot \lVert (1-\chi_r)(Tg)\rVert_q$$
Now, given $g$, $N$, and an arbitrary $\varepsilon > 0$, by the dominated convergence theorem, you can choose $r_0 > 0$ so that $\lVert (1-\chi_r)\cdot(Tg)\rVert_q < \varepsilon/N$ for all $r \geqslant r_0$. Choose additionally $r_0 > 2R$, so that the $\tau_{x_k}g$ have disjoint support.
If the $x_k$ are then chosen far enough apart, you find
$$N^{1/q}\lVert Tg\rVert_q - \frac{\varepsilon}{N^{1-1/q}} - N\cdot\frac{\varepsilon}{N} \leqslant \lVert Tf\rVert_q \leqslant N^{1/q}\lVert Tg\rVert_q + \varepsilon,$$
so $\lVert Tf\rVert_q \approx N^{1/q}\lVert g\rVert_q$, and
$$N^{1/q}\lVert Tg\rVert_q - 2\varepsilon \leqslant \lVert Tf\rVert_q \leqslant \lVert T\rVert \cdot \lVert f\rVert_p = N^{1/p}\lVert T\rVert\cdot\lVert g\rVert_p.$$
Since $\varepsilon$ could be arbitrarily chosen,
$$N^{1/q}\lVert Tg\rVert_q \leqslant N^{1/p}\lVert T\rVert\cdot \lVert g\rVert_p$$
for all $N$ and (smooth) $g$ with compact support. If $T \neq 0$, a smooth $g$ with compact support and $Tg \neq 0$ exists ($\mathscr{C}_c(\mathbb{R}^d)$ is dense in $L^p(\mathbb{R}^d)$ for $p < \infty$). Then, taking the limit for $N \to \infty$ would lead to a contradiction if $p > q$.
There's a nice proof of this in Hörmander's book "The Analysis of Linear Partial Differential Operators I" (Theorem 7.1.14). The main tool needed is a distributional version of Fubini's theorem (Theorem 5.1.1 in Hörmander's book). I'll summarize the main points:
Preliminary step 1: Fubini's theorem
If $u$ and $v$ are distributions on ${\mathbb R}^m$ and ${\mathbb R}^n$ respectively, then there is a unique product distribution $u \otimes v$ on ${\mathbb R}^{m+n}$ characterized by the condition
$$
(u \otimes v)(\varphi \otimes \psi) = u(\varphi) v(\psi)
$$
for all $\varphi \in {\mathcal D}({\mathbb R}^m)$ and $\psi \in {\mathcal D}({\mathbb R}^n)$, where we write $(\varphi \otimes \psi)(x,y) := \varphi(x) \psi(y)$. Moreover, $u \otimes v$ can be evaluated on an arbitrary test function $\varphi \in {\mathcal D}({\mathbb R}^{m+n})$ by
$$
(u \otimes v)(\varphi) = u(x \mapsto v(\varphi(x,\cdot))) = v(y \mapsto u(\varphi(\cdot,y))).
$$
(Note that there is a slightly nontrivial exercise required to show that both expressions on the right make sense, e.g. that $x \mapsto v(\varphi(x,\cdot))$ defines a smooth compactly supported function on ${\mathbb R}^m$. This rests mainly on the fact that by uniform continuity, $x \mapsto \varphi(x,\cdot)$ is a continuous map to the space of test functions, and other arguments of this sort.) If you write down this formula in the case where $u$ and $v$ are given by locally integrable functions, you'll find that it follows easily from the classical Fubini's theorem.
If you know that the product distribution is unique, then the formula follows by verifying directly that both expressions on the right hand side define distributions that satisfy the defining property of a product distribution. Uniqueness can be proved via mollification: if $u \otimes v$ were not unique, then there would exist a nontrivial distribution $w$ on ${\mathbb R}^{m+n}$ such that $w(\varphi \otimes \psi) = 0$ for all $\varphi \in {\mathcal D}({\mathbb R}^m)$ and $\psi \in {\mathcal D}({\mathbb R}^n)$. Choose approximate identities, i.e. sequences of smooth functions $\rho_j : {\mathbb R}^m \to [0,\infty)$ and $\sigma_j : {\mathbb R}^n \to [0,\infty)$ with shrinking compact supports near $\{0\}$ that converge in the space of distributions to $\delta$-functions. Then the classical Fubini theorem implies that the sequence $\rho_j \otimes \sigma_j : {\mathbb R}^{m+n} \to [0,\infty)$ also defines an approximate identity in the same sense, and it follows that the sequence of smooth functions $(\rho_j \otimes \sigma_j) * w$ converges to $w$ in the space of distributions. But those functions are all $0$ due to the defining property of $w$, thus $w=0$.
Preliminary step 2: polynomial growth
Before we can view the function $g(\xi) := u(\chi(x) e^{-i x \xi})$ as a plausible candidate for the Fourier transform of $\chi u$, we need to know that it behaves reasonably enough at infinity to define a tempered distribution. As I indicated in my comments on the previous answer, $g$ is definitely not a Schwartz function in general, but one can show that it has polynomial growth. Perhaps the quickest way is to use standard properties of the Fourier transform and rewrite $g$ as
$$
g(\xi) = \left( ( {\mathcal F}\chi)^- * {\mathcal F}^*u\right)(-\xi),
$$
where I'm using the notation $f^-(x) := f(-x)$. As the convolution of a Schwartz function with a tempered distribution, it follows from standard results about the convolution that this function has polynomial growth.
The main argument
As stated in the question, we need to prove that the relation
$$
\int_{{\mathbb R}^n} u(\chi(x) e^{-i x \xi}) \phi(\xi) \, d\xi = u\left( \int_{{\mathbb R}^n} \chi(x) e^{- i x \xi} \phi(\xi)\, d\xi \right)
$$
holds for every $u \in {\mathcal D}'({\mathbb R}^n)$, $\chi \in {\mathcal D}({\mathbb R}^n)$ and $\phi \in {\mathcal S}({\mathbb R}^n)$. By step 2, we already know that both sides give well-defined tempered distributions when regarded as functionals of $\phi$, so by density, it will suffice to assume $\phi \in {\mathcal D}({\mathbb R}^n)$. The key observation now is that by the theorem in step 1, both sides can be identified with
$(u \otimes 1)(f)$, where $1 \in {\mathcal D}'({\mathbb R}^n)$ is the distribution $1(\varphi) := \int_{{\mathbb R}^n} \varphi(x)\, dx$ and $f \in {\mathcal D}({\mathbb R}^{m+n})$ is given by
$$
f(x,\xi) := \chi(x) \phi(\xi) e^{-i x \xi}.
$$
Best Answer
This answer is based on the comment above and Lemma 2.9 of the thesis of Ajay, as mentioned in that comment. Up to minor adaptation, the result proved in the Lemma can be stated as follows
To prove this, first note that it is sufficient to prove the case $n = 1$ since iterating this argument will give the result for general $n$. The key is then to note that $\phi$ vanishes on test functions which are of the form $\partial_{x_{m+1}}g$ for some $g$, by translation invariance. This means that if we pick $h \in \mathcal{D}(\mathbb{R})$ with $\int h = 1$ and set $\psi(g) = \phi(g \otimes h)$ then we have
$$\phi(f) - \psi\bigg( \int_{\mathbb{R}} f(\cdot, y) dy \bigg) = \phi(f - \tilde{f})$$ where $\tilde{f}(x,y) = h(y) \int f(x,z) dz$. Then since $\int f(\cdot, z) - \tilde{f}(\cdot,z) dz = 0$, it is a standard fact that there is a $g$ such that $\partial_{x_{m+1}}g = f - \tilde{f}$ which implies that $\phi(f-\tilde{f}) = 0$ so that $$\phi(f) = \psi\bigg(\int f(\cdot,y) dy \bigg)$$ as desired.
To conclude the result of the question, we just need to make a change of variables. Define $\tau(x,y) = (x-y,-y)$ and let $\tilde{F} = F \circ \tau^*$. Then $\tilde{F}(T_zf) = F((T_zf) \circ \tau)$. If $\tau_z f(x,y) = f(x-z,y-z)$ then we have that $(T_zf) \circ \tau = \tau_z (f \circ \tau)$ so by translation invariance of $F$, $\tilde{F}(T_zf) = \tilde{F}(f)$. Hence, by the Lemma there is an $\mathcal{R}[F]$ such that $$\tilde{F}(f) = \mathcal{R}[F]\bigg( \int_{\mathbb{R}^n} f(\cdot, y) dy \bigg).$$ Then, given $f$, $F(f) = \tilde{F}(\tau^*f)$ since $\tau^2 = \operatorname{Id}$. Hence $$F(f) = \mathcal{R}[F]\bigg(\int_{\mathbb{R}^n} \tau^*f(\cdot, y) dy \bigg)$$ which is exactly the desired result when $f$ is a tensor product.