It doesn't have to be thought of as cross product. It's just very convenient to think of it that way, so we teach it first. Indeed, even when I apply it in my job, I think of it as a cross product.
But first, your question about why the lever arm appears in the equations. Informally, we need to account for the length because a longer lever arm gives you more mechanical advantage. You can test this, yourself, with a wrench. Try to tighten a bolt holding the wrench right up near the head, then hold the wrench further out near the end, giving yourself a longer lever arm, and try to tighten it. You'll find you can tighten the bolt much better if you have a longer lever arm.
As for a mathematical explanation, you can show it using conservation of momentum and angular momentum. Construct any scenario using forces and show that momentum is conserved (it should be!). Now, pick any point as the "center" of your rotation, and calculate torques. You'll find that angular momentum is conserved. If you defined torque without the radius term, you'd find angular momentum would not be conserved. In fact, it turns out that if you have forces and conservation of momentum, you can always derive torques and conservation of angular momentum. And if you have torques and conservation of angular momentum, you can always derive the forces and the conservation of momentum! They're sort of duals of one another.
If you want to go further than that, many years from now you'll learn Lagrangian Mechanics and Nother's Theorem. You'll learn that the conservation of momentum is a very fundamental concept directly tied to the fact that our laws of physics are the same in all directions. Rotate an experiment, and the laws of physics will stay the same. There is no privileged direction where the laws of physics are "correct."
As for why torque is perpendicular to the force and the lever arm, that is actually just an artifact of mathematics, nothing more. When you get deeper into Lagrangian Mechanics, what you'll find is that this angular momentum is just one specialized case of a wider concept called "generalized angular momentum." In generalized angular momentum, the equivalent of torque is formed by the exterior product, r ∧ F. This is known as a bivector, as opposed to a normal vector. This works in any number of dimensions.
The exact definition of these bivectors is a bit of a pest to work with:
The exterior algebra Λ(V) of a vector space V over a field K is defined as the quotient algebra of the tensor algebra T(V) by the two-sided ideal I generated by all elements of the form x ⊗ x for x ∈ V (i.e. all tensors that can be expressed as the tensor product of a vector in V by itself).
What a whopper! However, we're really lucky that we live in 3 dimensions. As it turns out, when you crank out one of these bivectors in 3 dimensions, and look at how it behaves, a curious convenience shows up. They behave exactly the same as cross products. A bivector is not a vector, but it turns out these 3 dimensional bivectors have the same mathematical properties as cross products (which are a 3 dimensional concept).
Incidentally, this is also why we have to choose the right hand rule convention. Bivectors can be calculated without such a convention, but when you map them into vectors using the cross product, there's two choices you can make -- left handed or right handed. As long as you always choose one, the result is consistent.
Thus, for reasons that should be obvious, we choose to teach torque as a vector defined by r x F, rather than a bivector, r ∧ F. It's a whole lot simpler! But it comes with a price. The vector r x F has a "direction," since it's a vector. That direction is perpendicular to the force and the lever arm. The bivector didn't have this particular concept of direction. The concept of bivector direction is more nuanced, and more intuitively related to the direction of the force and the direction of the lever arm.
And so, you have your reason for the torque being "perpendicular." It really doesn't have anything to do with physics, as much as it has to do with avoiding having to teach you advanced vector algebra to do basic physics. You get the right answer using the cross product, because cross products and 3 dimensional exterior products operate the same.
The cross product of two vectors is really a bivector. It has a magnitude and a direction, but the magnitude is an area instead of a length, and the direction is a plane instead of a line.
Like two vectors can point in opposite directions while lying on the same line, two bivectors can "point" in opposite directions while lying in the same plane. You can think of the directions as clockwise and counterclockwise, though which of those is which depends on which side of the plane you're on.
Bivectors are useful for things that lie in a plane and have a clockwise/counterclockwise direction and a magnitude, like angular velocity.
In three dimensions (and only in three dimensions), you can identify a bivector with a vector perpendicular to the plane of the bivector, whose length is the bivector's area. Because of this, bivectors are usually not taught as such. Instead, you have a cross product that produces another vector, whose direction is given by the right hand rule.
Best Answer
When studying angular things - torque, angular velocity, angular momentum, etc. - physicists do a clever thing to avoid having to describe curves. You see, you might be tempted to draw a curved arrow for a torque, indicating that you are twisting something around in a circular-ish way. But then when you try to add two such arrows together, all of a sudden you realize your notation no longer has a natural, intuitive meaning.
Instead, we draw the arrow pointing perpendicular to the plain of the curve you are tempted to draw. More precisely in the case of torque, perpendicular to the plain defined by the radial vector and the force vector. Note that this uniquely defines what plane your curved arrow must reside in, and, given the right-hand rule, clears up the ambiguity as to which way your curved arrow should point (if your right-hand fingers curl in the direction of the curved arrow you want to draw, your thumb points in the direction of the straight arrow you should draw instead).
It is then a simple matter to encode the magnitude of the torque/angular velocity/whatever in the length of this vector. The benefit is that you end up with straight arrows describing everything, and they add exactly as your torques should add - you have a genuine vector space, and are free to abstract away from all diagrams. And it is not even terribly counterintuitive - the torque vector is parallel to the axis around which you are applying torque. If you think about it long enough, you should be able to convince yourself that if you had to choose a single direction to define things, this is the least ambiguous.