A way to reframe the definitions: matrices $A,B: \Bbb R^n \to \Bbb R^m$ are equivalent if there exists bases $\mathcal B_1, \mathcal B_2$ of $\Bbb R^n,\Bbb R^m$ such that $B$ is the matrix of $A$ with respect to these new bases.
However, in the case of square matrices, we might say that choosing different bases for the first and second space loses too much structure from the original map $A$. For instance, take
$$
A = \pmatrix{0&1\\1&0}
$$
(the reflection through $y=x$). This map is equivalent to the identity map. However, $A$ is fundamentally different from $I$ in that $A$ changes the space, and $I$ doesn't. The problem here is that if we can change our perspective in the output space, we can make it seem as though $A$ didn't do anything at all; we can take the image of our input basis to be our output basis.
In order to get an idea of how $A$ changes a space, we restrict ourselves to similarity. That is, we impose the constraint that the basis we choose for the output space should be the same as what we choose for the input space; this basis provides us with a fixed frame of reference. A useful choice of basis, in this instance, is $\mathcal B = \{(1,1),(1,-1)\}$. Relative to this basis, we find that
$$
[A]_{\mathcal B} = B = \pmatrix{1&0\\0&-1}
$$
we see that the similar matrix $B$ is clearly a reflection since it flips the sign of the $y$-coordinate.
One notable property is that if $A$ is similar to $B$, then $A^n$ is similar to $B^n$ for any $n$.
I don't know the answer for real matrices of general dimension (see below for the $n = 2$ case), but for complex matrices the description of the congruence equivalence classes was worked out by Aitken and Turnbull, and there is a nice statement of the classification theorem in $\S\,$3 of the conference paper user1551 linked in the comments: In short, one defines several families of building block matrices, and the theorem asserts that every matrix is congruent to a direct sum of such blocks.
Here's how one can proceed: For a general $n \times n$ matrix $A$, the properties of symmetry and skewness are invariant under congruence, so if we decomposes any $n \times n$ matrix into its symmetric and skew parts,
$$A = \underbrace{\tfrac{1}{2}(A + A^{\top})}_{A^{\textrm{sym}}} + \underbrace{\tfrac{1}{2}(A - A^{\top})}_{A^{\textrm{skew}}} ,$$
the invariants of $A^{\textrm{sym}}$ and $A^{\textrm{skew}}$ under congruence are invariants of $A$ under congruence. So the problem of classifying matrices up to congruence is the same as the problem of classifying pairs of matrices, one symmetric and one skew-symmetric, up to congruence by a common matrix, that is, classifying pairs $(A^{\textrm{sym}}, A^{\textrm{skew}})$ up to equivalence under $$(A^{\textrm{sym}}, A^{\textrm{skew}}) \sim (P^{\top} A^{\textrm{sym}} P, P^{\top} A^{\textrm{skew}} P) .$$
Example For complex matrices, the only congruence invariants of $A^{\textrm{sym}}$ and $A^{\textrm{skew}}$ (separately) are their ranks, and these (pairs of) ranks are discrete invariants of a congruence class. In the special case $n = 2$, the possible building blocks are $$\pmatrix{0}, \qquad \pmatrix{1}, \qquad \pmatrix{&1\\-1&}, \qquad \pmatrix{1&-\alpha\\ \alpha&1}, \quad \alpha \in \Bbb C ,$$ which give the canonical forms
$$\underset{(0, 0)}{\pmatrix{0&\\&0}}, \quad \underset{(1, 0)}{\pmatrix{1&\\&0}}, \quad \underset{(2, 0)}{\pmatrix{1&\\&1}}, \quad \underset{(0, 2)}{\pmatrix{&1\\-1&}}, \quad \underset{(1, 2)}{\pmatrix{1&-i\\i&1}}, \quad \underset{(2, 2)}{\pmatrix{1&-\alpha\\\alpha&1}}, \,\,\,\, \alpha \neq i .$$ (The parameters $\pm \alpha$ give congruent matrices.) The pairs underset under each form are the invariants $(\operatorname{rank}(A^{\textrm{sym}}) , \operatorname{rank}(A^{\textrm{skew}}))$; the last case is generic. One can show that for a matrix $A$ whose symmetric and skew-symmetric parts have rank $2$, $\alpha^2 = \det A^{\operatorname{sym}} / \det A^{\textrm{skew}}$. The second and third classes are symmetric, the fourth class is skew-symmetric, and the last two classes are neither.
Over $\Bbb R$, most of these classes (or more precisely their intersections with $M(2, \Bbb R)$) decompose into finer real congruence classes; we should expect this, since over $\Bbb R$ the symmetric part alone has signature. Over $\Bbb R$, the real congruence classes consist of the six symmetric classes you already know, the class containing $$\pmatrix{-1&-1\\1&1},$$ and the generic classes $$\pmatrix{1&-\alpha\\\alpha&1}, \qquad \pmatrix{-1&-\beta\\\beta&1}, \quad \beta^2 \neq 1, \qquad \textrm{and} \qquad \pmatrix{-1&-\gamma\\ \gamma&-1}.$$ The matrices corresponding to $\pm \alpha$ are congruent, and likewise for the parameters $\beta$ and $\gamma$. See this note of G.D. Williams for more information about the $2 \times 2$ case over general fields.
F. de TerĂ¡n, "Canonical forms for congruence of matrices: Tribute to H.W. Turnbull and A.C. Aitken." (2010)
G.D. Williams, "Congruence of $(2 \times 2)$ matrices," Discrete Mathematics 224 (2000), 293-297.
Best Answer
Presume that they are similar, and in particular $M = P^{-1}M^{T}P$. What can you say about $P$?
Then write the equation as $PM = M^{T}P$. Given what we know about $P$, this should suggest what we need for $M$ and $M^{T}$ to be similar (i.e. it should tell us what type of matrix $P$ is sufficient to make the equation true).