Let's start with determining velocity $\newcommand{\v}{\mathbf{v}}\v$ from angular velocity $\newcommand{\w}{\boldsymbol{\omega}}{\w}$. If an object is currently at position $\newcommand{\r}{\mathbf{r}}\mathbf{r}$, and is rotating about a fixed point, which we will take to be the origin, with angular velocity $\w$, then the object's velocity is given by $\v =\w \times \r$.
Now to find the object's linear acceleration $\newcommand{\a}{\mathbf{a}}\a$, simply differentiate the above equation:
$\begin{equation}
\begin{aligned}
\a = \dot{\v} &= \dot{\w}\times \r + \w \times \dot{\r} \\
&= \newcommand{\al}{\boldsymbol{\alpha}}\al \times \r + \w \times \left( \w \times \r \right) \\
&= \al \times \r + \w \left(\w \cdot \r\right) - \omega^2 \r \\
&=\al \times \r - \omega^2\left(\mathbb{I} - \hat{\omega}\otimes\hat{\omega} \right)\r.
\end{aligned}
\end{equation}$
Above, the second line introduces the angular acceleration $\al$, defined as the time derivative of $\w$. Also in the second line, but in the second term, we used the result for velocity that $\dot{\r} = \v = \w \times \r$. In the end, we got that the linear acceleration $\a$ consists of two terms. The second term is the usual centripetal acceleration term, which looks like $-\omega^2r$, but there is a projection which makes sure you are using the separation from the closest point on the axis of rotation (that is, you subtract of the component of $\r$ along $\w$).
The tangential acceleration, then, must be contained in the first term. Since it is given by a cross-product with $\r$, we see it is perpendicular to $\r$ and therefore is "tangential" in the sense that is tangent to the sphere of radius $r$. Notice in the special case where the axis of rotation is fixed, so that $\a$ and $\w$ are colinear, $\a= \al \times \r$ is colinear with $\v = \w \times \r$, so the tangential acceleration is colinear with the velocity as expected.
It doesn't have to be thought of as cross product. It's just very convenient to think of it that way, so we teach it first. Indeed, even when I apply it in my job, I think of it as a cross product.
But first, your question about why the lever arm appears in the equations. Informally, we need to account for the length because a longer lever arm gives you more mechanical advantage. You can test this, yourself, with a wrench. Try to tighten a bolt holding the wrench right up near the head, then hold the wrench further out near the end, giving yourself a longer lever arm, and try to tighten it. You'll find you can tighten the bolt much better if you have a longer lever arm.
As for a mathematical explanation, you can show it using conservation of momentum and angular momentum. Construct any scenario using forces and show that momentum is conserved (it should be!). Now, pick any point as the "center" of your rotation, and calculate torques. You'll find that angular momentum is conserved. If you defined torque without the radius term, you'd find angular momentum would not be conserved. In fact, it turns out that if you have forces and conservation of momentum, you can always derive torques and conservation of angular momentum. And if you have torques and conservation of angular momentum, you can always derive the forces and the conservation of momentum! They're sort of duals of one another.
If you want to go further than that, many years from now you'll learn Lagrangian Mechanics and Nother's Theorem. You'll learn that the conservation of momentum is a very fundamental concept directly tied to the fact that our laws of physics are the same in all directions. Rotate an experiment, and the laws of physics will stay the same. There is no privileged direction where the laws of physics are "correct."
As for why torque is perpendicular to the force and the lever arm, that is actually just an artifact of mathematics, nothing more. When you get deeper into Lagrangian Mechanics, what you'll find is that this angular momentum is just one specialized case of a wider concept called "generalized angular momentum." In generalized angular momentum, the equivalent of torque is formed by the exterior product, r ∧ F. This is known as a bivector, as opposed to a normal vector. This works in any number of dimensions.
The exact definition of these bivectors is a bit of a pest to work with:
The exterior algebra Λ(V) of a vector space V over a field K is defined as the quotient algebra of the tensor algebra T(V) by the two-sided ideal I generated by all elements of the form x ⊗ x for x ∈ V (i.e. all tensors that can be expressed as the tensor product of a vector in V by itself).
What a whopper! However, we're really lucky that we live in 3 dimensions. As it turns out, when you crank out one of these bivectors in 3 dimensions, and look at how it behaves, a curious convenience shows up. They behave exactly the same as cross products. A bivector is not a vector, but it turns out these 3 dimensional bivectors have the same mathematical properties as cross products (which are a 3 dimensional concept).
Incidentally, this is also why we have to choose the right hand rule convention. Bivectors can be calculated without such a convention, but when you map them into vectors using the cross product, there's two choices you can make -- left handed or right handed. As long as you always choose one, the result is consistent.
Thus, for reasons that should be obvious, we choose to teach torque as a vector defined by r x F, rather than a bivector, r ∧ F. It's a whole lot simpler! But it comes with a price. The vector r x F has a "direction," since it's a vector. That direction is perpendicular to the force and the lever arm. The bivector didn't have this particular concept of direction. The concept of bivector direction is more nuanced, and more intuitively related to the direction of the force and the direction of the lever arm.
And so, you have your reason for the torque being "perpendicular." It really doesn't have anything to do with physics, as much as it has to do with avoiding having to teach you advanced vector algebra to do basic physics. You get the right answer using the cross product, because cross products and 3 dimensional exterior products operate the same.
Best Answer
We need a quantity that conveys information about the direction of the radius vector and the direction of the force vector.
Two vectors determine a plane. So we need a quantity that specifies a plane. One way to do that is to specify the vector normal to the plane. That's what the cross product does for us. There is an ambiguity as to direction: there are two normal vectors to a plane. We solve this by choosing one arbitrarily: we decide by convention to use the right-hand rule.
There are other ways to represent a torque that some would argue are more natural, for example, the bivector. These other ways are usually extensible to dimensions higher than three, whereas the cross product works only in three dimensions. Well, we live in a world having three spatial dimensions. That fact, and years and years of usage and tradition, has cemented the cross product into our toolbox.
The cross product has a few oddities associated with it, but it does the job. Some people think we should do away with the cross product. It might be nice to do that and use a more natural mathematical construct, but trying to make a change like that is like rolling a very large boulder up a hill.