OK, here's an excessively long expansion of my comment.
According to Goulden and Jackson's Enumerative Combinatorics (p. 76), the commutative version of the original formula is due to MacMahon, though they only refer to his book Combinatory Analysis and don't give a more specific reference. I was not able to find this formula in MacMahon's book, but on pages 99–100 of Volume I (Section III, Chapter III) MacMahon gives the related generating function (in modern notation)
$$\frac{1}{1-e_2 -2e_3 -3e_4-\cdots},$$
for counting derangements of a multiset. Here $e_i$ is the $i$th elementary symmetric function. (MacMahon uses $p_i$ for the elementary symmetric function, which is the modern notation for the power sum symmetric function.) It's not hard to show that (with $p_i$ the power sum symmetric function $x_1^i+x_2^i+\cdots$) we have
$$
\frac{1}{1-p_1+p_2-\cdots} = \frac{1+e_1+e_2+\cdots}{1-e_2 -2e_3 -3e_4-\cdots}.
$$
A combinatorial interpretation of the connection between these two generating functions has been given by J. Dollhopf, I. Goulden, and C. Greene, Words avoiding a reflexive acyclic relation, Electronic J. Combin. 11, no. 2, 2004–2006.
Words with adjacent letters different were called waves by L. Carlitz, who gave the (commutative) generating function for them in Enumeration of sequences by rises and falls: a refinement of the Simon Newcomb problem, Duke Math. J. 39 (1972), 267–280. This is probably the first appearance of the generating function, unless it's hiding somewhere in MacMahon. (Carlitz actually solved the more general problem of counting words by rises, falls, and levels.) Nowadays words with adjacent letters different are usually called Smirnov words or Smirnov sequences. This term was introduced by Goulden and Jackson; apparently Smirnov suggested the problem of counting these words though it's not clear that he did anything to solve the problem. According to the review in Mathematical Reviews of O. V. Sarmanov and V. K. Zaharov, A combinatorial problem of N. V. Smirnov (Russian),
Dokl. Akad. Nauk SSSR 176 (1967) 530–532 (I didn't look up the actual paper),
“The late N. V. Smirnov posed informally the following problem from the theory of order statistics: Given $n$ objects of $s+1$ distinct types (with $r_i$ objects of type $i$, $r_1+\cdots+r_{s+1}=n$), find the number of ways these objects may be arranged in a chain, so that adjacent objects are always of distinct types.”
When considered as compositions, i.e., when the entries are added together, Smirnov words are often called Carlitz compositions, as they were studied from this point of view by L. Carlitz, Restricted compositions,
Fibonacci Quart. 14 (1976), no. 3, 254–264.
The generalization that Darij describes in his fourth comment was first proved, as far as I know (though stated in a weaker commutative form) by Ralph Fröberg, Determination of a class of Poincaré series, Math. Scand. 37 (1975), 29–39 (page 35). It was proved (independently) shortly thereafter in L. Carlitz, R. Scoville, and T. Vaughan, Enumeration of pairs of sequences by rises, falls, and levels, Manuscripta Math. 19 (1976), 211–243 (Theorem 7.3). Their statement of the theorem doesn't seem to use noncommuting variables, though their proof contains a formula—equation (7.7)—which is essentially the noncommutative version. (I am not sure that this really makes any difference.) Just to be clear, I'll restate the theorem here, more or less, though not exactly, the way Carlitz, Scoville and Vaughan state it, with some comments in brackets.
Let $S$ be a finite set of objects and let $A$ and $B$ be complementary subsets of $S\times S$. Let $F_A$ be the generating function for all paths [today we would call them words, or possibly sequences] which avoid relations from $A$. [This is referring to a partition of $A$ which is related to their applications of the theorem, but is not really relevant to the theorem.] More specifically, define
$$F_A = 1+\sum s_{i_1}+\sum s_{i_1}s_{i_2}+\sum s_{i_1}s_{i_2}s_{i_3}+\cdots,$$
where, for example, the last sum is taken over all $i_1,i_2,i_3$ such that $s_{i_1} \mathrel{B}s_{i_2}$ and $s_{i_2}\mathrel{B} s_{i_3}$. (We use lower-case $s_i$'s for both the members of the set $S$ and for the indeterminates [which are presumably commuting] in the enumeration.)
We also introduce
$$\tilde F_B = 1-\sum s_i +\sum s_{i_1}s_{i_2}-\cdots$$
where the signs alternate and the relations must be from $A$ instead of $B$.
7.3 THEOREM. The functions $F_A$ and $\tilde F_B$ are related by $F_A\cdot \tilde F_B = 1$.
Both Fröberg and Carlitz–Scoville–Vaughhan prove this by showing that all the terms in $F_A\cdot \tilde F_B$ except 1 cancel in pairs. However there is another way to prove it: expand $\tilde F_B^{-1}$ as
$\sum_{k=0}^\infty (1-\tilde F_B)^k$ and use inclusion-exclusion.
Carlitz, Scoville, and Vaughan then apply the theorem to counting Smirnov words.
The Carlitz–Scoville–Vaughan theorem is one of my favorite formulas in enumerative combinatorics, and my 1977 Ph.D. thesis has many applications of it. The slides from a talk I gave about this theorem can be found here.
This is not a reference, but a short proof.
We use the following (probably known, but see later) lemma on representing a symmetric tensor as a linear combination of rank-1 symmetric tensors.
Lemma. Let $A$ be a finite set, $K$ an infinite field. Denote by $\mathcal S$ the set of symmetric functions $p:A^n\to K$. Then $\mathcal S$ is the $K$-span of rank-one functions, that is, the functions of the type $h(x_1)h(x_2)\ldots h(x_n)$, where $h:A\to K$.
Proof. Note that the product of two rank-one functions is a rank-one function. Thus the linear space $\mathcal T$, generated by rank-one functions, coincides with the $K$-algebra generated by them.
We may suppose that $A\subset K$. For $k=0,1,\ldots,n$ denote $e_k(x_1,\ldots,x_n)$ the elementary symmetric polynomial, that is, $\varphi_t(x_1,\ldots,x_n):=\prod(1+tx_i)=\sum_{k=0}^n t^ke_k$. We identify $e_k$ and the corresponding element of $\mathcal S$. Choosing $n+1$ distinct values $t_1,\ldots,t_{n+1}\in K$ and solving the corresponding (Vandermonde's) linear system of equations we represent each $e_k$ as a linear combinations of $\varphi_{t_i}\in \mathcal T$. Thus $e_k\in \mathcal S$ for all $k=0,1,\ldots,n$. It is well known that $e_k$'s generate the algebra of symmetric polynomials (over any field). Thus any symmetric polynomial function belongs to $\mathcal T$. It remains to note that any symmetric function $f\in \mathcal S$ may be represented by a symmetric polynomial. Indeed, a symmetric function $f$ may be represented as $F(e_1,e_2,\ldots,e_n)$ for certain function function $F$ defined on the corresponding finite set (because the values of $e_1,\ldots,e_n$ determine the values of $x_1,\ldots,x_n$ up to permutation). $F$ in turn coincides with a polynomial function on this finite set. $\square$
Now we may prove your theorem for finitely supported function $i\mapsto p_i$. Due to Lemma it may be supposed to have the form $p_i=\prod_{k=1}^n H(i_k)$ for a certain finitely supported function $H$ on $\mathbb{N}$ (as OP, I denote here $\mathbb{N}=\{0,1,\ldots\}$). In this case both parts of your identity are equal to $\det (\sum_m H(m)A^m)$.
Comment. Lemma does not hold for finite fields. For example, if $A=K=\{0,1\}$. Then the function $x+y+z$ is not a linear combination of rank-one functions 1, $xyz$, $(x+1)(y+1)(z+1)$: if $x+y+z=a+bxyz+c(x+1)(y+1)(z+1)$, then for $y=0,z=1,x=a$ we get $0=1$. I must make a warning that in the subject-related paper "Symmetric tensors and symmetric tensor rank" by Pierre Comon, Gene Golub, Lek-Heng Lim, Bernard Mourrain (SIAM Journal on Matrix Analysis and Applications, 2008, 30 (3), pp.1254-1279) this statement, after equation (1.1), is stated for any field, although proved for complex numbers, and the proof uses that a non-zero polynomial has non-zero values.
In any case, you may always enlarge the ground field and safely think that it is infinite.
Best Answer
$\DeclareMathOperator{\ex}{ex}$ One way to look at this is through symmetric functions. To be consistent with standard notation I'll take infinitely many variables $x_1, x_2,\dots$ (instead of $\varepsilon_1, \varepsilon_2,\ldots$). I will follow the presentation in Richard Stanley's Enumerative Combinatorics, Vol. 2, page 304.
There is a homomorphism ex from the ring of (unbounded degree) symmetric functions to power series in $t$ that can be defined by $\ex(p_1) = t$, $\ex(p_n)=0$ for $n>1$, where $p_n$ is the power sum symmetric function $x_1^n+x_2^n+\cdots$. Then $\ex$ is the restriction to symmetric functions of the homomorphism on all formal power series in $x_1,x_2,\dots$ that takes each $x_i^2$ to 0 (where $t$ is the image of $p_1$). It has the property that for any symmetric function $f$, $$\ex(f) = \sum_{n=0}^\infty [x_1x_2\cdots x_n] f \frac{t^n}{n!},$$ where $[x_1x_2\cdots x_n] f$ denotes the coefficient of $x_1x_2\cdots x_n$ in $f$. In particular, $\ex(h_n) = \ex(e_n) = t^n/n!$ where $h_n$ and $e_n$ are the complete and elementary symmetric functions.
This idea is well known and is very useful in enumerative combinatorics. It allows one to derive exponential generating functions for objects with distinct labels (e.g., permutations or standard Young tableaux) from symmetric function generating functions for objects with repeated labels (e.g., words or semistandard tableaux). There are related homomorphisms that preserve more information; see, for example, section 7.8 of Stanley.