After some time I found the solution to Question 1 in the excellent book:
"Stochastic Optimal Control: The Discrete-Time Case", by Dimitri P. Bertsekas and Steven E. Shreve, freely available at link
The answer is YES, see Corollary 7.46.1 page 177, Chapter 7.
For what concerns Question 2 I have not found a solution yet, but I guess that after reading properly this book, I'll be able to solve it by myself.
thank you
matteo
Reposting some proofs I gave in the comments as a (partial) answer, demonstrating that the answer to Julian's question is "yes" for Bernoulli/i.i.d. systems, and so that it is not resolved by some usual "there's no way to guarantee a rate of mixing" examples. The question actually seems pretty subtle, although my honest guess is that the answer is "no" in general.
Say $\mu$ is i.i.d. on $S^{\mathbb{Z}}$ for some finite set $S$ (and that $T = \sigma$ is the left shift on $S^{\mathbb{Z}}$), that $A$ depends only on coordinates $-k$ through $k-1$, and $\epsilon > 0$. Assume for a contradiction that Julian's property fails. Then, for all $n$, there exist $B_n$ and $i_0,...,i_{n-1}$ s.t. $|\mu(A\cap \sigma^{i_j}B_n)-\mu(A)\mu(B_n)| \geq \epsilon$ for $0 \leq j < n$. (I've removed the negative power for simplicity, since $\sigma$ is invertible here.)
We can assume without loss of generality that $i_0 > k$ and $i_j - i_{j-1} > 2k$ for all $0 < j < n$ (since this decreases the size of $\{i_0, \ldots, i_{n-1}\}$ by at most a factor of $2k$).
Note that $\mu$ is i.i.d., so exchangeable. Take a self-map $\pi_n$ of
$S^{\mathbb{Z}}$ sending coordinate $i_j + m$ to $2kj + m$ for $0 \leq j < n$ and $-k \leq m < k$, and consider $C_n=\pi_n(B_n)$; clearly $\mu(C_n) = \mu(B_n)$. Then, since $A$ depends only on coordinates in $[-k, k)$, $\mu(A \cap \sigma^{i_j} B_n) = \mu(A \cap \sigma^{2kj} C_n)$ for $0 \leq j < n$. So, $\mu(A\cap \sigma^{2kj} C_n)-\mu(A)\mu(C_n))| \geq \epsilon$ for $0 \leq j < n$.
Take a weakly convergent sequence $C_{m_n} \rightarrow C$. (Here, I mean that we identify a set $D$ with the measure $\mu_D$ defined by $\mu_D(E) = \mu(D \cap E)$, and take a weak(star) convergent subsequence.) Then, since $A$ is clopen (i.e. $\chi_A$ is continuous), $\mu(A\cap \sigma^{2kj}C)-\mu(A)\mu(C)| =
\lim_{n \rightarrow \infty} \mu(A\cap \sigma^{2kj}C_{m_n})-\mu(A)\mu(C_{m_n})| \geq \epsilon$ for all $j$. And this contradicts mixing of $\mu$.
Finally, we note that for arbitrary $A$ and $\epsilon$ (i.e. $A$ doesn't depend on only finitely many coordinates), there exists $A'$ where $\mu(A \triangle A') < \epsilon/3$. Then, for any set $B$, $|\mu(A)\mu(B) - \mu(A')\mu(B)| < \epsilon/3$, and for any $j$, $\mu(A\cap \sigma^j B) - \mu(A' \cap \sigma^j B)| < \epsilon/3$. By the triangle inequality, $|\mu(A' \cap \sigma^j B) - \mu(A')\mu(B)| < \epsilon/3 \Longrightarrow |\mu(A \cap \sigma^j B) - \mu(A)\mu(B)| < \epsilon$. This means that, using terminology from the original question, $N_{A, \epsilon} \leq N_{A', \epsilon/3}$. The second quantity has been proved finite, so the first is as well, completing the proof.
Best Answer
I don't think that it's ever possible to mandate any rate of mixing if you are looking at all measurable sets. Here's a silly example using the full $2$-shift $\{0,1\}^{\mathbb{Z}}$ and the i.i.d. measure $\mu$ with $\mu([0]) = \mu([1]) = 1/2$ (in some sense the "most mixing" SFT and measure).
Define $B$ to be $[0]$, the set of sequences with $x(1) = 0$, i.e. with $0$ in the first coordinate. Clearly $\mu(B) = 0.5$.
Define $C$ to be the set of sequences $x$ satisfying all of the following conditions: $x(1)$ is not $0$, $x(2)$ and $x(3)$ are not both $0$, $x(4), x(5), x(6)$ are not all $0$, $x(7), x(8), x(9), x(10)$ are not all $0$, etc. Then $\mu(C) = \prod_{n=1}^{\infty} (1 - 2^{-n}) > 0$.
But for any $k$, $\mu(\sigma^k B \ | \ C)$ is slightly less than $\mu(B) = 0.5$, in fact it's $0.5 - \frac{1}{2^{n+1} - 2}$, where $n$ is the length of the "condition" that $x(k)$ is part of. (For instance, when $k = 8$, $n = 4$, since $x(8)$ is part of the condition "$x(7), x(8), x(9), x(10)$ are not all $0$."
This already has subexponential mixing rate, but you can make the rate as slow as desired by just spacing your conditions out. For instance, you could wait until $x(n!!)$ to start the $n$th condition, and then $\mu(\sigma^{n!!} B \ | \ C)$ is still roughly $2^{-n}$ away from $\mu(B) = 0.5$.