Reposting some proofs I gave in the comments as a (partial) answer, demonstrating that the answer to Julian's question is "yes" for Bernoulli/i.i.d. systems, and so that it is not resolved by some usual "there's no way to guarantee a rate of mixing" examples. The question actually seems pretty subtle, although my honest guess is that the answer is "no" in general.
Say $\mu$ is i.i.d. on $S^{\mathbb{Z}}$ for some finite set $S$ (and that $T = \sigma$ is the left shift on $S^{\mathbb{Z}}$), that $A$ depends only on coordinates $-k$ through $k-1$, and $\epsilon > 0$. Assume for a contradiction that Julian's property fails. Then, for all $n$, there exist $B_n$ and $i_0,...,i_{n-1}$ s.t. $|\mu(A\cap \sigma^{i_j}B_n)-\mu(A)\mu(B_n)| \geq \epsilon$ for $0 \leq j < n$. (I've removed the negative power for simplicity, since $\sigma$ is invertible here.)
We can assume without loss of generality that $i_0 > k$ and $i_j - i_{j-1} > 2k$ for all $0 < j < n$ (since this decreases the size of $\{i_0, \ldots, i_{n-1}\}$ by at most a factor of $2k$).
Note that $\mu$ is i.i.d., so exchangeable. Take a self-map $\pi_n$ of
$S^{\mathbb{Z}}$ sending coordinate $i_j + m$ to $2kj + m$ for $0 \leq j < n$ and $-k \leq m < k$, and consider $C_n=\pi_n(B_n)$; clearly $\mu(C_n) = \mu(B_n)$. Then, since $A$ depends only on coordinates in $[-k, k)$, $\mu(A \cap \sigma^{i_j} B_n) = \mu(A \cap \sigma^{2kj} C_n)$ for $0 \leq j < n$. So, $\mu(A\cap \sigma^{2kj} C_n)-\mu(A)\mu(C_n))| \geq \epsilon$ for $0 \leq j < n$.
Take a weakly convergent sequence $C_{m_n} \rightarrow C$. (Here, I mean that we identify a set $D$ with the measure $\mu_D$ defined by $\mu_D(E) = \mu(D \cap E)$, and take a weak(star) convergent subsequence.) Then, since $A$ is clopen (i.e. $\chi_A$ is continuous), $\mu(A\cap \sigma^{2kj}C)-\mu(A)\mu(C)| =
\lim_{n \rightarrow \infty} \mu(A\cap \sigma^{2kj}C_{m_n})-\mu(A)\mu(C_{m_n})| \geq \epsilon$ for all $j$. And this contradicts mixing of $\mu$.
Finally, we note that for arbitrary $A$ and $\epsilon$ (i.e. $A$ doesn't depend on only finitely many coordinates), there exists $A'$ where $\mu(A \triangle A') < \epsilon/3$. Then, for any set $B$, $|\mu(A)\mu(B) - \mu(A')\mu(B)| < \epsilon/3$, and for any $j$, $\mu(A\cap \sigma^j B) - \mu(A' \cap \sigma^j B)| < \epsilon/3$. By the triangle inequality, $|\mu(A' \cap \sigma^j B) - \mu(A')\mu(B)| < \epsilon/3 \Longrightarrow |\mu(A \cap \sigma^j B) - \mu(A)\mu(B)| < \epsilon$. This means that, using terminology from the original question, $N_{A, \epsilon} \leq N_{A', \epsilon/3}$. The second quantity has been proved finite, so the first is as well, completing the proof.
Best Answer
One can construct a counterexample using a stationary Gaussian process. The basic idea is to use a system that is weak mixing with very good bounds, without being strong mixing, as it is the former property that is enough to guarantee a pointwise ergodic theorem of the type you seek. Stationary Gaussian processes have the advantage of being tractable to compute with (everything is determined, in principle at least, from the autocorrelation function). Also note that any counterexample must be at least weakly mixing, since a non-trivial eigenfunction $f \circ T = \lambda f$ of the process would lead to a failure of the pointwise ergodic theorem after restricting to shifts in a Bohr set such as $\{ n: |\lambda^n - 1| \leq 0.1 \}$.
Let $a_n$ be the sequence $a_n := 10^{10^n}$ (actually any sufficiently rapidly growing sequence would work here). One can construct a stationary Gaussian process $(X_n)_{n \in {\bf Z}}$ as the limit in distribution (as $m \to \infty$) of the processes $$ \left( 2^{-m/2} \sum_{\epsilon_1,\dots,\epsilon_m \in \{0,1\}} Y_{n+\epsilon_1 a_1 + \dots + \epsilon_m a_m} \right)_{n \in {\bf Z}}$$ where $(Y_n)_{n \in {\bf Z}}$ an iid Gaussian process of mean zero and variance one. One can check that the $X_n$ are well defined as a stationary, mean zero Gaussian process and that the autocorrelation function is $$ \mathrm{Cov}(X_n, X_{n+h}) = 2^{-j}$$ when $h$ is of the form $h = \pm a_{i_1} \pm \dots \pm a_{i_j}$ for some $i_1 < \dots < i_j$, and $$ \mathrm{Cov}(X_n, X_{n+h}) = 0$$ otherwise. In particular, $X_n$ and $X_{n+h}$ are independent unless one is in the rare situation when $h$ is a signed sum of the $a_1,a_2,\dots$.
Since $\mathrm{Cov}(X_0, X_{a_i}) = 1/2$, the autocorrelation function fails to decay to zero, thus this process is not strongly mixing.
To show the pointwise ergodic theorem for a positive density sequence $n_1 < n_2 < \dots$, it suffices from the maximal ergodic theorem to do so assuming $f$ is some bounded function of a finite number of the $X_i$, and we can normalise $f$ to have mean zero. Then $f$ and $f \circ T^h$ will be independent unless $h$ is of the form $\pm a_{i_1} \pm \dots \pm a_{i_j} + O(1)$ for some $i_1 < \dots < i_j$. As this is a sparse set, we can obtain good weak mixing bounds of the form $$ \sum_{h=-H}^H |\langle f \circ T^h, f \rangle| \ll 3^{\log_{10} \log_{10} H} \ll \log H \tag{1}$$ with our choice of $a_n = 10^{{10}^n}$. (Actually even just the bound $H/\log^2 H$ would be enough for our application.)
We would like to show that $\frac{1}{N} \sum_{i=1}^N f \circ T^{n_i}$ converges a.e. to zero. By a standard approximation argument, it suffices to do so assuming that $N$ belongs to a lacunary sequence, e.g., $\lfloor (1+\varepsilon)^m \rfloor$ for $m=1,2,\dots$ and a fixed $\varepsilon>0$. By the Borel-Cantelli lemma and Markov's inequality, it then suffices to obtain a bound of the shape $$ \int_X |\frac{1}{N} \sum_{i=1}^N f \circ T^{n_i}|^2\ d\mu \ll \frac{1}{\log^2 N}$$ (say) for large $N$. Opening up the square, and noting from positive density that $n_N \asymp N$, we can bound the LHS by $$\ll \frac{1}{N} \sum_{h=-n_N}^{n_N} |\langle f \circ T^h, f \rangle|$$ and the claim follows from (1).