My "solution" is a bit strange, but I hope it is correct.
We regard $V$ as a module over $A$. $X = {\rm Spec}\ A$ is zero-dimensional. Then (as a sheaf on $X$) $V$ decomposes into a direct sum of sheaves over the finitely many points of $X$. This corresponds to a decomposition $V$ into subspaces corresponding to different eigenvalues in the sense you described. We must have a point $p$ such that the piece $V_p$ above it (which is the localization of $V$ at the maximal ideal $\mathfrak{m}$ of $A$ corresponding to $p$) is nonzero. Localizing at this ideal and restricting ourselves to the subspace $V_p$, we may assume that $A$ is local. If there would be no common eigenvector in $V_p$, then we would have $\mathfrak{m}V_p = V_p$, so by Nakayama's lemma $V_p = 0$, a contradiction.
darij: Does it deserve to be "awfully sophisticated" by your standards?
I think it really comes down to looking at the lattice of invariant subspaces for each $A_i$. I believe this is what Igor basically is saying.
For a single real linear transformation $V \rightarrow V$ with distinct characteristic roots, $V$ decomposes as a sum of 1 and 2-dimensional invariant subspaces, and these generate the lattice of invariant subspaces. When you perturb the matrix by a known amount, the subspaces can wander around over a certain range --- how much they shift depends on how close together are the characteristic roots, and also on the angles between the subspaces. For simplicity,
think of the case with all real distinct eigenvalues. The projectivized action, on
$\mathbb{RP}^{n-1}$, then has $n$ fixed points. The first derivative of the projectivized map is simple to determine, given the eigenvectors and eigenvalues --- the eigenvalues of the derivative at a point for eigenvalue $\lambda$ are the ratios of the other eigenvalues to $\lambda$. Therefore, if you change the map by adding multiples of the other eigenvectors as a linear function of the projection to the eigenspace under consideration
you shift your given eigenvector by a known linear function of the coefficients.
If you have several matrices with all real distinct eigenvalues that are near enough to a commuting set of matrices, then presumably the eigenvectors are also near to each other, so you can tell which eigenvectors need to line up with each other. You have a good measurement of how hard it is to move each eigenvector in a given direction, so you can
find a point that's approximately equally hard for each (do this separately, for each matching collection of eigenvectors.) Then adjust each of the eigenvectors to go to that point, and repeat on the other eigenspaces.
When there are complex roots, there are 2-dimensional invariant subspaces, which appear as invariant circles in $\RP^{n-1}$. You can do much the same thing with these.
When there are multiple real or complex roots, then it's necessary to look at larger-dimensional invariant subspaces. One strategy is to first fix up invariant subspaces
associated with the product of all factors of the characteristic polynomial whose roots are in a small neighborhood, or a pair of complex conjugate neighborhoods. The goal is to get such subspaces to agree with, to contain, or to be contained in similar suspaces for the other matrices. If there are repeated roots, then first see if there's a small perturbation that makes multiple eigenvectors (or multiple 2-dimensional invariant subspaces) --- when this is possible, it gives a bigger target for matching commuting matrices. If not, look at the largest invariant subspace it contains.
I doubt if there's an easy way other than looking at these invariant subspace structures. Perhaps in the actual applications you have in mind, they're not so complicated.
Best Answer
Edit Now this answers the first question for the operator norm and the normalized Hilbert-Schmidt norm.
The answer depends on the norm you are considering. The answer is no for the operator norm, but is yes for the normalized Hilbert-Schmidt norm (at least if you replace $O(\varepsilon)$ by $o(1)$, see the answers to this question).
Here are some details on the counterexample for the operator norm.
1+2 imply that there is a sequence of triples $A_1^n,A_2^n,A_3^n$ of matrices of norm less than $1$ which are pairwise close to (self-adjoint) commuting matrices, but whose distance to the triples of commuting matrices is bounded below.