Whether sparse PCA is easier to interpret than standard PCA or not, depends on the dataset you are investigating. Here is how I think about it: sometimes one is more interested in the PCA projections (low dimensional representation of the data), and sometimes -- in the principal axes; it is only in the latter case that sparse PCA can have any benefits for the interpretation. Let me give a couple of examples.
I am e.g. working with neural data (simultaneous recordings of many neurons) and am applying PCA and/or related dimensionality reduction techniques to get a low-dimensional representation of neural population activity. I might have 1000 neurons (i.e. my data live in 1000-dimensional space) and want to project it on the three leading principal axes. What these axes are, is totally irrelevant for me, and I have no intention of "interpreting" these axes in any way. What I am interested, is the 3D projection (as the activity depends on time, I get a trajectory in this 3D space). So I am fine if each axis has all 1000 non-zero coefficients.
On the other hand, somebody might be working with more "tangible" data, where individual dimensions have obvious meaning (unlike individual neurons above). E.g. a dataset of various cars, where dimensions are anything from weight to price. In this case one might actually be interested in the leading principal axes themselves, because one might want to say something: look, the 1st principal axis corresponds to the "fanciness" of the car (I am totally making this up now). If the projection is sparse, such interpretations would generally be easier to give, because many variables will have $0$ coefficients and so are obviously irrelevant for this particular axis. In the case of standard PCA, one usually gets non-zero coefficients for all variables.
You can find more examples and some discussion of the latter case in the 2006 Sparse PCA paper by Zou et al. The difference between the former and the latter case, however, I did not see explicitly discussed anywhere (even though it probably was).
I think you need to better define what you are looking for. You could have 10 variables that each individually account for 90% of the variance, but if that is the same 90% of the variance then that may not be interesting to you. Performing regression with L1 and/or L2 norms can help you to identify variables or groups of variables that correlate well with your data. There are also other techniques available such as Minimum Redundancy Maximum Relevance that help to select features that are strong predictors.
Best Answer
Per @amoeba's comments: