Ok, so figuring that stress is defined to be the force per unit area around a point, I'd like to know why it's defined that way, rather than say force per unit volume enclosing a point.
Here is the intuitive explanation.
Suppose you have a string S from which a weight W is hanging. We will suppose that the amount of weight is so much that the string is near the regime where it would snap, but is not quite there yet. We're going to think about what happens when we add another length of string S.
If you add it in parallel, so there are two strings holding W, then it seems obvious that you could then increase the mass to about 2W before the combined two-string system would be near-breaking.
By contrast, if you add it in series (W hangs from S hanging from S), then we know that the tension force is the same in both strings, and it seems likely that they'd simply both be near-breaking. The top string might even break, if the weight of the string underneath it is enough to put it past breaking.
This tells us that the material properties which concern us (like when a string breaks) respond to stresses, defined as forces divided by an area perpendicular to that force. The direction that lies alongside the force doesn't matter, qualitatively because it propagates the force rather than responding to it.
You can also examine this two-string thought-experiment with much smaller forces not near breaking, where Hooke's law should hold for lengthening, to find that if the original deforms by $\delta L = \ell$, the two-strings-in-parallel should deform by $\ell/2$ while the two-strings-in-series should deform by $2\ell,$ so that there seems to be a constant "stress / strain" relationship where the "strain" is defined as $\delta L / L.$ In other words you can look at a system with the normal spring-constant $F = k ~ \delta L$ relationship, but it is more helpful from a materials perspective to divide by the length of the "spring" $L$ and also its cross-sectional area $A$ to find $$\sigma = \frac FA = \frac {kL}{A} ~ \frac{\delta L}{L} = \lambda ~ \epsilon.$$ The elastic modulus $\lambda$ then also has units of pressure (force per unit area) and is more fundamental to the material (a material-property) that you're studying than the spring constant (a property of both the material and the setup) is.
Since the elastic modulus is a material-property, this hints that "stress" is the right definition of "force" and "strain" is the right definition of "displacement" at the more-microscopic level when we're peeking inside the substance.
In turn, it becomes very common in materials science to show the "stress/strain" curve of various materials. This starts out of course as a straight line through the origin with slope $\lambda$, but then as a substance deforms it will describe some sort of curve as added stress leads to further strain. So for example the plastic bags from the supermarket will curve upwards; they get stiffer as they stretch more.
Once you know that the stress is the right way to "microscopically" define force, elastic systems start to show off a similar problem: the simplest stretching of a beam consists in that beam not only lengthening but also narrowing. Microscopically, a little box inside the substance is not only feeling a force $+\sigma~dA$ on one side and a force $-\sigma~dA$ on the other side (so it is in force balance and provides tension), but it must also be feeling some forces on its other sides which "pinch" it smaller. So: the force is direction-dependent, and we therefore have not a stress vector (which we already didn't quite have -- the stress is in opposite directions on the top and bottom of the box), but a stress tensor: give me a direction and I'll give you the stress vector on a plane normal to that direction. (This also solves nicely the "stress on the top of the box is negative the stress on the bottom of the box" problem.)
Usually this simplifies a lot because there are eigenvectors of the stress tensor: directions where the stress points exactly normal to the plane it deforms. Those are called the "principal stresses". However there is no reason why they'd have to be orthogonal, especially, for example, in a crystal lattice which is not cubic.
Best Answer
The reason you see constant-cross-section geometries in introductory classes is that their stiffness (i.e., the load required to obtain a given displacement) is constant for small displacements. This isn't the case for a sphere or any geometry whose contact area shrinks to zero for small loads. For a gentle touch, these geometries have near-infinite compliance! This behavior is studied in the field of contact mechanics (see, for instance, Johnson's Contact Mechanics and Fischer-Cripps's Nanoindentation). You are right that the area of contact is an important parameter in this context.
The small-deformation case for a sphere contacting a flat surface has an exact solution for a single side (Case 2 here): the deformation $\alpha$ (retaining the nomenclature used in the link) is $$\alpha=\frac{(3\pi)^{2/3}}{2}P^{2/3}\left(\frac{1-\sigma^2}{\pi E}\right)^{2/3}D^{-1/3}=\frac{1}{2D^{1/3}}\left(\frac{3P}{ E^\star}\right)^{2/3},$$
where $P$ is the applied force, $\sigma$ is the sphere Poisson's ratio, $E$ is the sphere Young's modulus, $D$ is the sphere diameter, and $E^\star=\frac{E}{1-\sigma^2}$ is the reduced Young's modulus.
(You could calculate an effective contact area, analogous to a prismatic solid, by evaluating $\frac{PD}{\alpha E}$, but this isn't too useful, to my knowledge. Instead, the actual contact area is $\left(\frac{3PD}{8E^\star}\right)^{1/3}$, which may be useful in your similarity comparison of spheres of different sizes.)
To add a second side (the other side), we apply symmetry to obtain Case 3 in the link.
Regarding yielding, Johnson gives the yield load $P_\mathrm{Y}$ when applying the von Mises criterion as $$P_\mathrm{Y}=\frac{\pi^3D^2}{24E^{\star 2}}(1.60Y)^3.$$ where $Y$ is the yield stress. In comparison, $P_\mathrm{Y}=\frac{\pi D^2}{4}Y$ when applying an axial load on a cylinder of diameter $D$. Johnson thus notes that when designing for strength for curved contact, a low Young's modulus is desirable (the Young's modulus isn't a factor for yielding in the case of flat contact at the end of prismatic shape). This makes intuitive sense because the load is spread out over the larger contact area of a more compliant material.