Here is a concrete counterexample to the conjecture, now that I’m reading it correctly.
Let $R=(a\lor b)^3\Big(a(a\lor b)^5\Big)^*$ and $S=(a\lor b)^3\Big(b(a\lor b)^5\Big)^*$. These have the $7$-state DFAs whose transitions tables are shown below; $s_0$ is the initial state, and $s_3$, shown in red, is the sole acceptor state.
$$\begin{array}{r|cc}
M_R&a&b\\ \hline
s_0&s_1&s_1\\
s_1&s_2&s_2\\
s_2&\color{red}{s_3}&\color{red}{s_3}\\
\color{red}{s_3}&s_4&s_6\\
s_4&s_5&s_5\\
s_5&s_0&s_0\\
s_6&s_6&s_6
\end{array}
\qquad\qquad
\begin{array}{r|cc}
M_S&a&b\\ \hline
s_0&s_1&s_1\\
s_1&s_2&s_2\\
s_2&\color{red}{s_3}&\color{red}{s_3}\\
\color{red}{s_3}&s_6&s_4\\
s_4&s_5&s_5\\
s_5&s_0&s_0\\
s_6&s_6&s_6
\end{array}$$
The eight words of length $3$ are the only words of length at most $8$ in both $L(M_R)$ and in $L(M_S)$, but the languages are not the same: $a^9\in L(M_R)\setminus L(M_S)$.
The problem is that the minimal redundant loop $s_3\to s_4\to s_5\to s_0\to s_1\to s_2\to s_3$ is too long and its starting point too far removed from the initial state, so the difference in the two languages doesn’t show up within the first $8$ transitions.
In general:
For a general approach I don't know anything better than constructing an automaton, minimizing it, and then recalculating the equivalence classes again. However, there are a few tricks that very often speed up this process, or sometimes even make it unnecessary. Still, this works only because the automatons you usually have to deal with are quite simple, if you were to do this for an automaton with a looooot of states, writing a computer program from scratch that does this would be faster.
Relevant intuition on regular languages:
The regular languages are, intuitively, languages which require only finite amount of information at any given time to decide whether the input belongs or not to the language.
To be more precise, finite amount of information means that there are finitely many variables, each having finitely many possible values.
That means, if we take all the possible configurations, we still end up with finitely many states, and exactly that notion is exemplified by the finite automatons: each state corresponds to some configuration.
That is also why the language of parentheses given by grammar: $X \to X\mathtt{(}X\mathtt{)} \mid \varepsilon$ is not regular – to parse it you need to know at any given moment how many parentheses are open, and that number can be arbitrarily big, so even if that is only one number, it is far more than finite amount of information.
However, this intuition is even more apparent when you consider the syntactic monoid of a language. A different characterization of regular languages is that they are recognizable by finite monoids. In particular, if $M$ is the smallest monoid that recognizes language $L$ via some homomorphism $\phi : \Sigma^* \to M$, then that homomorphism $\phi$ is actually the information filter that distills all the necessary info, and leaves nothing more. Another way to think about it is to imagine that each word $w \in \Sigma^*$ contains some information, but not all that information is needed to decide whether $w \in L$. When we map $w$ into $M$ as $\phi(w)$, then what we get is a finite representation of all the information that can be necessary to deal with $w$ in context of $L$.
The basic trick:
The trick I've mentioned at the beginning is to guess with what info it is possible to decide whether $w \in L$. For example in your case
$$L=\Big\{w \in \{\mathtt{a},\mathtt{b}\}^* \ \Big|\ w=\mathtt{a}v \text{ for a word } v \in\{\mathtt{a},\mathtt{b}\}^* \text{ and } |w|_{\mathtt{b}} \text{ is even}\Big\}$$
with have two bits of information:
- if $w$ starts with $\mathtt{a}$,
- if $|w|_{\mathtt{b}}$ is even.
This implies that we will need at most $2\cdot2 = 4$ states, and this is what you get. Such guessing requires a bit of insight, for example
$$L_3=\Big\{w \in \{\mathtt{a},\mathtt{b}\}^* \ \Big|\ w=\mathtt{a}v \text{ for a word } v \in\{\mathtt{a},\mathtt{b}\}^* \text{ and } |w|_{\mathtt{b}} \text{ is divisible by } 3\Big\}$$
also seems to have 2 bits of information, but actually we need 6 states.
The new condition $|w|_{\mathtt{b}} \text{ is divisible by } 3$ requires three states to handle (remainder $0$, $1$ and $2$), so that when we concatenate two words $w_1$ and $w_2$ we will be able to tell the new value just from values of $w_1$ and $w_2$ instead of looking at the whole words (with only two possible values we wouldn't be able to tell the difference between $|w_1|_{\mathtt{b}} = 1 = |w_2|_{\mathtt{b}}$ and $|w_1|_{\mathtt{b}} = 1, |w_2|_{\mathtt{b}} = 2$).
Some caveats:
Now that we are able to identify all the atoms (compare with atoms in probability), we can generate some equivalence classes, but these may not be minimal. For example consider
$$L_\vee=\Big\{w \in \{\mathtt{a},\mathtt{b}\}^* \ \Big|\ w=\mathtt{a}v \text{ for a word } v \in\{\mathtt{a},\mathtt{b}\}^* \text{ or } |w|_{\mathtt{b}} \text{ is divisible by } 3\Big\}$$
Of course, we can split $\Sigma^*$ into six equivalence classes that suffice to handle $L$:
$$\{\text{starts_with_}\mathtt{a},\text{starts_with_}\mathtt{b}\}
\times\{\text{rem_}0,\text{rem_}1,\text{rem_}2\},$$
but we need only $4$, the remainders of $|w|_{\mathtt{b}} \bmod 3$ are necessary only if $w$ does not start with $\mathtt{a}$:
$$\{
\text{sw}\mathtt{a},
\text{sw}\mathtt{b}\text{_rem_}0,
\text{sw}\mathtt{b}\text{_rem_}1,
\text{sw}\mathtt{b}\text{_rem_}2
\}.$$
Closing word:
There is a lot of things to consider and this is not a place for me to dwell on them, however, if you start thinking this way, a lot of structure will "pop out". One nice example to practice on is
$$L_{10} = \Big\{ w \in \{\mathtt{a},\mathtt{b}\} \ \Big|\
\text{the 10-th letter of $w$ from the back is }\mathtt{a}\Big\}.$$
It is easy to construct NFA for $L_{10}$, but DFA will have a lot of states. Yet, if you think about it it makes a lot of sense, because the info the syntactic monoid will want you to remember contains at least $10$ bits, i.e. whether the $i$-th letter from the back is $\mathbb{a}$ or not. In other words, you need a way to represent a 10-bit number, so you have to have at least $2^{10}$ different equivalence classes or states in any deterministic finite automaton that recognizes $L_{10}$.
So that was one basic trick and some hints on how to extend it. It's not magic, but it simplified things for me a lot, perhaps you will like this approach too.
I hope this helps $\ddot\smile$
Best Answer
One does not always have to construct the minimal DFA (although if the language is regular, then this is a good way to do it as it will always work). In fact, sometimes it will be near impossible to construct the minimal DFA because it will have an infinite number of states. By the Myhill-Nerode Theorem, this would then tell us that the language is not regular. For instance, consider the language given in this question. Here, each word is shown to be its own equivalence class, but it is proven without the construction of a minimal DFA.