Your book is giving you an oversimplified description (because it is written for neophytes), and that is part of what is confusing you. Stress is a 2nd order tensor entity (called the stress tensor), and, in component form, requires 6 numbers to specify the state of stress at a specific location in space. The stress tensor can be used to determine the traction acting on any surface of specified orientation. So, once you know these 6 components, you can determine the normal and tangential traction on a surface. The components of the stress tensor can be arranged in a symmetric 3x3 matrix and, when matrix multiplied by a 3x1 column vector representing a unit normal to a specified surface, delivers a 3x1 column vector representing the components of the traction exerted by the material on one side of the surface acting on the material on the other side of the surface. This is called the Cauchy stress relationship. I hope that this makes some kind of sense to you.
1. Yes, the relation $$\mathrm{stress}=d(\mathrm{strain\,energy\,density})/d(\mathrm{strain})$$ holds for all elastic bodies, not just linearly elastic bodies. This equation implies that all differential work goes into elastic strain energy, which holds even for nonlinearly elastic materials (e.g., hyperelastic materials). However, the equation wouldn't apply to plastic deformation, for example, in which substantial amounts of work are converted to heat and expended through the formation of crystal defects.
2. Regarding the intuition behind this equation, we can say that any way to add energy to a system involves two parameters (called thermodynamic conjugate variables): a generalized force and a generalized displacement. The first term is intensive; i.e., if you doubled the system size, then the generalized force would stay the same. The second term is extensive; if you doubled the system size, then this term would also double.
The simplest example of a generalized force and displacement is an actual force $F$ and displacement $x$ and the familiar equations $w=\boldsymbol{F\cdot x}$ and $dw=F\,dx$ for the work $w$. Another example is the pressure $P$ and volume $V$: $dw=-P\,dV$, with the minus sign appearing because pressure is compressive. Note how a gradient in pressure, the intensive variable, drives a shift in volume, the extensive variable. This effect is common for all of these pairs, whose units invariably multiply to give units of energy.
(This framework applies even to heating: the system energy $U$ increases with $T\,dS$, where gradients in temperature $T$ drive shifts in the entropy $S$. Here again, the units multiply to give units of energy.)
Yet another example of a conjugate pair is the stress and strain. Well actually, this isn't entirely true. If you look at the units, you'll see that the product of stress and strain has units of volumetric energy. So we can work with the elastic strain energy density or what you call above the strain energy function $W$, or we can work in terms of energy by multiplying by the volume, as in the fundamental relation for a first-order closed system under a general mechanical load: $dU=T\,dS+\boldsymbol{\bar{\sigma}} V\,d\boldsymbol{\bar{\epsilon}}$, where $\boldsymbol{\bar{\sigma}}$ and $\boldsymbol{\bar{\epsilon}}$ are the stress and strain tensors, respectively. (If the load is pressure, or equitriaxial compressive stress, then we recover the familiar $dU=T\,dS-P\,dV$.)
3. As for deriving your starred equation, I checked Nye's Physical Properties of Crystals and Ugural & Fenster's Advanced Strength and Applied Elasticity, and they proceed as you do: define the increase in strain energy from a uniaxial load applied to a differential element and then build up to the complete 3D case. For an isotropic material (which obeys generalized Hooke's Law), for example, Ugural & Fenster obtain a strain energy density of $$W=\frac{1}{2E}\left(\sigma_{x}^2+\sigma_{y}^2+\sigma_{z}^2\right)-\frac{\nu}{2E}\left(\sigma_{x}\sigma_y+\sigma_{y}\sigma_z+\sigma_{x}\sigma_z\right)+\frac{1}{2G}\left(\tau_{xy}^2+\tau_{yz}^2+\tau_{xz}^2\right).$$
Best Answer
Before starting to outline my understanding, let me link two related questions on Physics SE here and here. Further, let me give my main sources for learning continuum mechanics which my answer will mainly be inspired by:
Let $\mathcal{P}$ be a part of the material body with surface $\partial\mathcal{P}$. We now assume that there are two types of forces that can act on this body part. On the one hand, there are forces that act on the bulk of the material ("on each of the overcountable small particles the body is made up of") and we can characterise them by a body force density. On the other hand, there are forces that are actually transmitted through the material body as contact forces and thus for the body part $\mathcal{P}$ they only act on its surface $\partial\mathcal{P}$. A typical force like this is pressure throughout a fluid. These force contributions on the surface are the surface traction $\vec{\mathbf{t}}$. Cauchy's theorem states that there is a tensor field, the Cauchy stress tensor $\mathbf{T}$, which for a surface with surface normal $\vec{\mathbf{n}}$ gives the traction on that surface at that point as $\mathbf{T}\vec{\mathbf{n}}$. The important point here is that the traction vector depends on the chosen surface by definition because it represents the force contribution onto a body part which is enclosed by this chosen surface. If we choose a different surface, we also get a physically different force because it is the force on another body part.