We need a quantity that conveys information about the direction of the radius vector and the direction of the force vector.
Two vectors determine a plane. So we need a quantity that specifies a plane. One way to do that is to specify the vector normal to the plane. That's what the cross product does for us. There is an ambiguity as to direction: there are two normal vectors to a plane. We solve this by choosing one arbitrarily: we decide by convention to use the right-hand rule.
There are other ways to represent a torque that some would argue are more natural, for example, the bivector. These other ways are usually extensible to dimensions higher than three, whereas the cross product works only in three dimensions. Well, we live in a world having three spatial dimensions. That fact, and years and years of usage and tradition, has cemented the cross product into our toolbox.
The cross product has a few oddities associated with it, but it does the job. Some people think we should do away with the cross product. It might be nice to do that and use a more natural mathematical construct, but trying to make a change like that is like rolling a very large boulder up a hill.
To add to Steeven's answer and in particular his very pertinent statement:
You can't define a vector direction as something that turns around.
It may help you to understand that torque as a vector is actually cheating a little bit: it's a "simplification" that we can only get away with in two and three dimensions, which is why the "direction" seems a little abstract. The torque "vector" direction defines the axis of the motion that it tends to induce, and for the same reason that torque as a vector is a bit of a trick, even the notion of axis only works in two and three dimensions.
Torque is about rotation, and rotations primarily are about transformations that are confined to planes. For example, a rotation about the $z$-axis is a transformation that churns up the $x-y$ plane - it transforms the $x$ and $y$ co-ordinates of things - but leaves the $z$ co-ordinates unchanged.
When we do higher dimensional geometry, rotations change planes and leave more than one dimension invariant. In a four dimensional rotation, it's incomplete to speak of a rotation about an axis, because, for example, you can have a rotation that transforms the $x$ and $y$ co-ordinates of points invariant, but it leaves the $z$ and $w$ co-ordinate invariant.
So, in general, the easiest way to specify a rotation is by specifying the plane that it changes, rather than specifying the subspace that it leaves invariant.
It just so happens that in three dimensions, the subspace left invariant is a line or an "axis"- so the two approaches amount to the same thing. We can define a plane in three dimensions by specifying a vector normal to it, which is why we can get away with a torque or angular velocity as a vector. In general these quantities are directed planes, not lines with direction.
Best Answer
Why?
Because a moment is a manifestation of a force at a distance, the same way the a velocity is a manifestation of a rotation at a distance. Given two points A and B you know that $$ \vec{M}_A = \vec{r}_{AB} \times \vec{F}_B \\ \vec{v}_A = \vec{r}_{AB} \times \vec{\omega}_B $$
The force at B causes a torque at A, simarly to how a rotation at B causes velocity at A.
So Why is that?
Both forces/torques and velocities/rotations are 3D screws that contain the following properties. a) A line of direction, b) a magnitude, c) a pitch. Forget about the b) and c) for now and focus on the line.
How do you describe a line in 3D. A line has 4 degrees of freedom, and it is usually represented using 6 components with something called Pluecker coordinates. There involve two vectors, each with 3 components. The first vector, I call $\vec{F}$ gives the direction of line line, but its magnitude is not important. So two degrees of freedom are used from the vector. The second vector, I call $\vec{M}$ gives the moment of the line about the origin and it is used to describe the closest point of the line to the origin. It too uses two degrees of freedom because the location along the line is unimportant. It represents either a) The moment of a force along the line, or b) the speed of a rotating body about the line. The location of the line is given by
$$ \vec{r} = \frac{\vec{F} \times \vec{M}}{\vec{F} \cdot \vec{F}} = - \frac{\vec{M} \times \vec{F}}{\vec{F} \cdot \vec{F}} $$ depending on which you like best.
Similarly for motions
$$ \vec{r} = \frac{\vec{\omega} \times \vec{v}}{\vec{\omega} \cdot \vec{\omega}} = - \frac{\vec{v} \times \vec{\omega}}{\vec{\omega} \cdot \vec{\omega}} $$.
So the moment is a manifestation of a line at a distance.