Let us rewrite $x_t, x_{t-1}, \dots, x_{t-K+1}$ in terms of $x_{t-K}$
$$x_t=c\left(1+\varphi+\dots+\varphi^{K-1}\right)+\varepsilon_t+\varphi\varepsilon_{t-1}+\dots+\varphi^{K-1}\varepsilon_{t-K+1}+\varphi^Kx_{t-K}$$
$$x_{t-1}=c\left(1+\varphi+\dots+\varphi^{K-2}\right)+\varepsilon_{t-1}+\varphi\varepsilon_{t-2}+\dots+\varphi^{K-2}\varepsilon_{t-K+1}+\varphi^{K-1}x_{t-K}$$
$$\dots$$
$$x_{t-K+1}=c+\varphi x_{t-K}+\varepsilon_{t-K+1}$$
Then we denote a moving average process of $x_t$ of window $K$ as $\tilde{x}_{t}^K$ and have the following
$$\tilde{x}_t^K=\frac{1}{K}\left[\sum_{i=0}^{K-1}\left(c(K-i)\varphi^i+\varepsilon_{t-i}\sum_{j=0}^i\varphi^j\right)+x_{t-K}\sum_{i=1}^K\varphi^i\right]$$
$$\begin{align}
\operatorname{\mathbb{V}ar}\left(\tilde{x}^K_t\right) &= \frac{1}{K^2}\left[\operatorname{\mathbb{V}ar}\left(\sum_{i=0}^{K-1}\varepsilon_{t-i}\sum_{j=0}^i\varphi^j\right)+\operatorname{\mathbb{V}ar}\left(x_{t-K}\sum_{i=1}^K\varphi^i\right)\right]\\
&= \frac{1}{K^2}\left[\sigma^2_{\varepsilon}\sum^{K-1}_{i=0}\left(\frac{\varphi^{i+1}-1}{\varphi-1}\right)^2+\frac{\sigma^2_{\varepsilon}}{1-\varphi^2}\left(\frac{\varphi^{K+1}-\varphi}{\varphi-1}\right)^2\right]\\
&=\frac{\sigma^2_{\varepsilon}}{K^2}\left[\frac{1}{(\varphi-1)^2}\sum^{K-1}_{i=0}\left(\varphi^{i+1}-1\right)^2+\frac{\varphi^2(\varphi^K-1)^2}{(1-\varphi)^2(1-\varphi^2)}\right]
\end{align}$$
and some more algebra leads to..
$$\operatorname{\mathbb{V}ar}\left(\tilde{x}^K_t\right)=\frac{\sigma^2_{\varepsilon}\left(\varphi^2(K\varphi-K+2)-2\varphi^{1+K}(\varphi-1)+K-\varphi(K+2)\right)}{(1+\varphi)(1-\varphi)^4K^2}$$
sigma <- 2.5
phi <- 0.6
K <- 3
const <- 2
set.seed(321)
eps <- rnorm(1e5, sd = sigma)
x <- filter(c(0, const + eps), filter = phi, method = "recursive")
MAvar <- function(phi, sigma, K)
sigma^2 / (K^2 * (phi + 1) * (1 - phi)^4) *
(phi^2 * (K * phi - K + 2) - 2 * phi^(1 + K) * (phi - 1) + K - phi * (K + 2))
library(zoo)
ma <- rollmean(x, K)
var(ma)
# [1] 6.67111
MAvar(phi, sigma, K)
# [1] 6.640625
The question suggests some basic confusion between the equation and the solution
The Equation
Let ${\varphi} > 1$.
Consider the following (infinite) system of equations---one equation for each $t\in \mathbb{Z}$:
$$
X_{t}=\varphi X_{t-1}+e_{t}, \mbox{ where } e_t \sim WN(0,\sigma), \;\; t \in \mathbb{Z}. \quad (*)
$$
Definition
Given $e_t \sim WN(0,\sigma)$, a sequence of random variables $\{ X_t \}_{t\in \mathbb{Z}}$ is said to be a solution of $(*)$ if, for each $t$,
$$
X_{t}=\varphi X_{t-1}+e_{t},
$$
with probability 1.
The Solution
Define
$$
X_t= - \sum_{k=1}^\infty {\varphi}^{-k}e_{t+k},
$$
for each $t$.
$X_t$ is well-defined: The sequence of partial sums
$$
X_{t,m} = - \sum_{k=1}^m {\varphi}^{-k}e_{t+k}, \;\; m \geq 1
$$
is a Cauchy sequence in the Hilbert space $L^2$, and therefore converges in $L^2$. $L^2$ convergence implies convergence in probability (although not necessarily almost surely). By definition, for each $t$, $X_t$ is the $L^2$/probability-limit of $(X_{t,m})$ as $m \rightarrow \infty$.
$\{ X_t \}$ is, trivially, weakly stationary. (Any MA$(\infty)$ series with absolutely summable coefficients is weakly stationary.)
$\{ X_t \}_{t\in \mathbb{Z}}$ is a solution of $(*)$, as can be verified directly by substitution into $(*)$.
This is a special case of how one would obtain a solution to an ARMA model:
first guess/derive an MA$(\infty)$ expression, show that it is well-defined, then verify it's an actual solution.
$\;$
...But the $\epsilon_t$ is not independent from $X_{t}$...
This impression perhaps results from confusing the equation and the solution.
Consider the actual solution:
$$
\varphi X_{t-1} + e_t = \varphi \cdot \left( - \sum_{k=1}^\infty {\varphi}^{-k}e_{t+k-1}
\right) + e_t,
$$
the right-hand side is exactly $- \sum_{k=1}^\infty {\varphi}^{-k}e_{t+k}$, which is $X_t$ (we just verified Point #3 above). Notice how $e_t$ cancels and actually doesn't show up in $X_t$.
$\;$
...where this...derivation originally comes from...
I believe Mann and Wald (1943) already considered non-causal AR(1) case, among other examples. Perhaps one can find references even earlier. Certainly by the time of Box and Jenkins this is well-known.
Further Comment
The non-causal solution is typically excluded from the stationary AR(1) model because:
It is un-physical.
Assume that $(e_t)$ is, say, Gaussian white noise. Then, for every non-causal solution, there exists a causal solution that is observationally equivalent, i.e. the two solutions would be equal as probability measures on $\mathbb{R}^{\mathbb{Z}}$. In other words, a stationary AR(1) model that includes both causal and non-causal cases is un-indentified. Even if the non-causal solution is physical, one cannot distinguish it from a causal counterpart from data. For example, if innovation variance $\sigma^2 =1$, then the causal counterpart is causal solution to AR(1) equation with coefficient $\frac{1}{\varphi}$ and $\sigma^2 =\frac{1}{\varphi^2}$.
Best Answer
A stable filter is a filter which exists, and is causal. Causal means that your current observation is a function of past or contemporaneous noise, not future noise. Why do they use the word stable? Well, intuitively, you can see what happens when you simulate data from the model if $|\phi| > 1$. You will see the process could not hover around some mean for all time.
If you rewrite your model as $$ X_t - \mu = \varphi(X_{t-1} - \mu) + \epsilon_t $$ with $c = \mu(1-\varphi)$, then you can re-write it again as $$ (1-\varphi B) Y_t = \epsilon_t \tag{1} $$ where $B$ is the backshift operator and $Y_t = X_t - \mu$ is the demeaned process.
A filter is a (possibly infinite) linear combination that you apply to to white noise (I take white noise to mean errors that are mutually uncorrelated and mean zero. This doesn't mean that they are independent, necessarily.). Filtering white noise is a natural way to form time series data. We would write filtered noise as $$ \psi(B)\epsilon_t = \left(\sum_{j=-\infty}^{\infty}\psi_j B^j\right)\epsilon_t = \sum_{j=-\infty}^{\infty}\psi_j \epsilon_{t-j}, $$ where the collection of coefficients $\{\psi_j\}$ is our impulse response function.
This only exists (has finite expectation and variance) if the coefficients far away get small fast enough. Usually they are assumed to be absolutely summable, that is $\sum_{j=-\infty}^{\infty} |\psi_j| < \infty$. Showing that this is a sufficient condition is a detail you might want to fill in yourself.
Getting $\psi(B)$ our filter from $\varphi(B)$ our model's AR polynomial is not always something you can do, though. If we could divide both sides of (1) by $(1-\varphi B)$, then your model is $$ Y_t = \sum_{j=-\infty}^{\infty}\psi_j \epsilon_{t-j}, $$ and this is just like doing simple algebra. We would do this, and then figure out what each $\psi_j$ was in terms of $\varphi$. You can only do this, however, if the roots of the complex polynomial $1 - \varphi z$ are not zero (otherwise you would be dividing by zero), or equivalently if $|\varphi|\neq1$ if you're writing the constraint in terms of the parameters instead of the complex number $z$. If moreover $|\varphi| < 1$, (or if you're stating it in terms of $z$ again, the roots are outside of the unit circle), then your model is causal, and you don't have to filter future noise: $$ Y_t = \sum_{j=0}^{\infty}\psi_j \epsilon_{t-j} = \sum_{j=0}^{\infty}\varphi^j \epsilon_{t-j}. $$ See how the sum representing the lag runs from $0$ to $\infty$ now?
Figuring out the coefficients of $\psi(B)$ in terms of $\phi$ can be done by solving $(1 + \psi_1 B + \psi_2 B^2 + \cdots)(1 - \varphi B) = 1$, and this might be something you want to do yourself.