You should start with the strain energy density $\psi$, then define:
$$
C_{ijkl} = \frac{\partial^2 \psi}{\partial \epsilon_{ij}\partial \epsilon_{kl}}, $$
and then define $$
\sigma_{ij} = C_{ijkl} \epsilon_{kl}
$$
The remainder of my answer will be about explaining why you have to do it that way. Firstly it is physical, there really is energy associated with strain, and if there weren't there would not be any stress. Secondly, it is linear exactly because we are considering the changes in energy due to small strains.
But let's go back to the minor symmetries. We need $C_{ijkl}=C_{jikl}$ because otherwise $\sigma_{ij}\neq\sigma_{ji}$ (and then we get arbitrarily large, hence unphysical, angular velocities for smaller and smaller regions). But the other minor symmetry is not required. If someone handed you a random rank four tensor, let's call it $B_{ijkl}$, and called it an elastic tensor and it didn't have the second minor symmetry, you can define $C_{ijkl}=(B_{ijkl}+B_{ijlk})/2$ $D_{ijkl}=(B_{ijkl}-B_{ijlk})/2$ and then $B=C+D$ but when D is contracted with a symmetric rank two tensor (like the strain tensor) it gives zero. So the part of the rank four tensor without that second minor symmetry simply doesn't contribute, as an actor it does nothing (when you think all the elasticity does is give you stress from strain). So you might as well assume your tensor has both minor symmetries because it acts like it has the second ($B$ and $C$ act the same on symmetric tensors) and it has to have the first.
Did I bring that up to be pedantic? No, I brought it up because the same thing happens if you contract the elasticity tensor with a rank four symmetric combination of the strain tensor. The part of the elasticity tensor without the major symmetry doesn't contribute to the strain energy density. So a random tensor needs the first minor symmetry to be physical. But you might as well assume it has the second minor symmetry since it doesn't affect the stress-strain relationship. And you might as well assume it has the major symmetry because the part that doesn't will not contribute to the strain energy density.
But it is the strain energy density that is physical, and how it changes is what elasticity is. So you aren't really deriving these symmetries as much as saying that only the symmetric ones generate the physical things you want, energy when given strain. And a real derivation should start with strain energy density and strain, and then just define elasticity from that.
1. Yes, the relation $$\mathrm{stress}=d(\mathrm{strain\,energy\,density})/d(\mathrm{strain})$$ holds for all elastic bodies, not just linearly elastic bodies. This equation implies that all differential work goes into elastic strain energy, which holds even for nonlinearly elastic materials (e.g., hyperelastic materials). However, the equation wouldn't apply to plastic deformation, for example, in which substantial amounts of work are converted to heat and expended through the formation of crystal defects.
2. Regarding the intuition behind this equation, we can say that any way to add energy to a system involves two parameters (called thermodynamic conjugate variables): a generalized force and a generalized displacement. The first term is intensive; i.e., if you doubled the system size, then the generalized force would stay the same. The second term is extensive; if you doubled the system size, then this term would also double.
The simplest example of a generalized force and displacement is an actual force $F$ and displacement $x$ and the familiar equations $w=\boldsymbol{F\cdot x}$ and $dw=F\,dx$ for the work $w$. Another example is the pressure $P$ and volume $V$: $dw=-P\,dV$, with the minus sign appearing because pressure is compressive. Note how a gradient in pressure, the intensive variable, drives a shift in volume, the extensive variable. This effect is common for all of these pairs, whose units invariably multiply to give units of energy.
(This framework applies even to heating: the system energy $U$ increases with $T\,dS$, where gradients in temperature $T$ drive shifts in the entropy $S$. Here again, the units multiply to give units of energy.)
Yet another example of a conjugate pair is the stress and strain. Well actually, this isn't entirely true. If you look at the units, you'll see that the product of stress and strain has units of volumetric energy. So we can work with the elastic strain energy density or what you call above the strain energy function $W$, or we can work in terms of energy by multiplying by the volume, as in the fundamental relation for a first-order closed system under a general mechanical load: $dU=T\,dS+\boldsymbol{\bar{\sigma}} V\,d\boldsymbol{\bar{\epsilon}}$, where $\boldsymbol{\bar{\sigma}}$ and $\boldsymbol{\bar{\epsilon}}$ are the stress and strain tensors, respectively. (If the load is pressure, or equitriaxial compressive stress, then we recover the familiar $dU=T\,dS-P\,dV$.)
3. As for deriving your starred equation, I checked Nye's Physical Properties of Crystals and Ugural & Fenster's Advanced Strength and Applied Elasticity, and they proceed as you do: define the increase in strain energy from a uniaxial load applied to a differential element and then build up to the complete 3D case. For an isotropic material (which obeys generalized Hooke's Law), for example, Ugural & Fenster obtain a strain energy density of $$W=\frac{1}{2E}\left(\sigma_{x}^2+\sigma_{y}^2+\sigma_{z}^2\right)-\frac{\nu}{2E}\left(\sigma_{x}\sigma_y+\sigma_{y}\sigma_z+\sigma_{x}\sigma_z\right)+\frac{1}{2G}\left(\tau_{xy}^2+\tau_{yz}^2+\tau_{xz}^2\right).$$
Best Answer
You're essentially asking to compute $\nabla\cdot\boldsymbol{\sigma}$ in coordinate-free terms. Here's a possible way, although it requires some tricks, in particular the identities $$\text{tr}(\nabla \mathbf{a})=\text{tr}(\nabla \mathbf{a}^\mathsf{T})=\nabla\cdot \mathbf{a}$$ $$\nabla\cdot\nabla \mathbf{a}=\Delta \mathbf{a}$$ $$\nabla\cdot\nabla\mathbf{a}^\mathsf{T}=\nabla(\nabla\cdot\mathbf{a})$$ and $$\nabla\cdot(b\mathbf{I})=\nabla b$$ for vector $\mathbf{a}$, scalar $b$, and identity $\mathbf{I}$. It's then straightforward to get $$\nabla\cdot\boldsymbol{\sigma}=\nabla\cdot\left(2\mu\left(\frac{\nabla\mathbf{u}+\nabla \mathbf{u}^\mathsf{T}}{2}\right)+\lambda\text{tr}\left(\frac{\nabla\mathbf{u}+\nabla \mathbf{u}^\mathsf{T}}{2}\right)\mathbf{I}\right)$$ $$=\mu\nabla\cdot\nabla\mathbf{u}+\mu\nabla(\nabla\cdot\mathbf{u})+\lambda\nabla\cdot((\nabla\cdot\mathbf{u})\mathbf{I})$$ $$=\mu\Delta\mathbf{u}+\mu\nabla(\nabla\cdot\mathbf{u})+\lambda\nabla(\nabla\cdot\mathbf{u})$$ from which $$\mathbf{f}+\mu\Delta\mathbf{u}+(\lambda+\mu)\nabla(\nabla\cdot\mathbf{u})=\mathbf{0}$$ follows.
If anyone has a more straightforward way to demonstrate this in index-free notation, please feel free to post it as another answer! Index-free notation gets difficult once you deal with nontrivial objects. For example, with two matrix functions on space $\mathbf{A}(\mathbf{r})$, $\mathbf{B}(\mathbf{r})$, you have $$\nabla(\mathbf{A}\cdot\mathbf{B})=\sigma_{132}((\sigma_{132}\nabla\mathbf{A})\cdot\mathbf{B})+\mathbf{A}\cdot\nabla\mathbf{B}$$ where $\sigma_{132}$ is the index transposing operator
Transpose[#,{1,3,2}]&
from Mathematica, whereas with comma derivatives you just use product rule to get $$(A_{ij}B_{jk})_{,l}=A_{ij,l}B_{jk}+A_{ij}B_{jk,l}$$.EDIT: In index notation, the proof reads $$\sigma_{ij,j}=2\mu\frac{u_{i,jj}+u_{j,ij}}{2}+\lambda u_{k,kj}\delta_{ij}$$ $$=\mu u_{i,jj}+(\lambda+\mu)u_{j,ij}.$$
This is an example of why index notation is better. You don't need to remember a bunch of vector and matrix calculus identites, you just take derivatives.