A possible solution is to use the standard sparse PCA algorithm and increase the ridge penalty coefficient.
There are probably better solutions but this is what I did.
Whether sparse PCA is easier to interpret than standard PCA or not, depends on the dataset you are investigating. Here is how I think about it: sometimes one is more interested in the PCA projections (low dimensional representation of the data), and sometimes -- in the principal axes; it is only in the latter case that sparse PCA can have any benefits for the interpretation. Let me give a couple of examples.
I am e.g. working with neural data (simultaneous recordings of many neurons) and am applying PCA and/or related dimensionality reduction techniques to get a low-dimensional representation of neural population activity. I might have 1000 neurons (i.e. my data live in 1000-dimensional space) and want to project it on the three leading principal axes. What these axes are, is totally irrelevant for me, and I have no intention of "interpreting" these axes in any way. What I am interested, is the 3D projection (as the activity depends on time, I get a trajectory in this 3D space). So I am fine if each axis has all 1000 non-zero coefficients.
On the other hand, somebody might be working with more "tangible" data, where individual dimensions have obvious meaning (unlike individual neurons above). E.g. a dataset of various cars, where dimensions are anything from weight to price. In this case one might actually be interested in the leading principal axes themselves, because one might want to say something: look, the 1st principal axis corresponds to the "fanciness" of the car (I am totally making this up now). If the projection is sparse, such interpretations would generally be easier to give, because many variables will have $0$ coefficients and so are obviously irrelevant for this particular axis. In the case of standard PCA, one usually gets non-zero coefficients for all variables.
You can find more examples and some discussion of the latter case in the 2006 Sparse PCA paper by Zou et al. The difference between the former and the latter case, however, I did not see explicitly discussed anywhere (even though it probably was).
Best Answer
Another good package is the elasticnet package that Zou and Hastie put out. It has the function
spca
.Be careful to select a good value of $\lambda$, the sparsity parameter (or vector of them).
I would be curious to know which package ends up working better for you since the other package mentioned by @Stephan Kolassa is a year newer and is by Hastie's coauthor, Tibshirani.