Whether sparse PCA is easier to interpret than standard PCA or not, depends on the dataset you are investigating. Here is how I think about it: sometimes one is more interested in the PCA projections (low dimensional representation of the data), and sometimes -- in the principal axes; it is only in the latter case that sparse PCA can have any benefits for the interpretation. Let me give a couple of examples.
I am e.g. working with neural data (simultaneous recordings of many neurons) and am applying PCA and/or related dimensionality reduction techniques to get a low-dimensional representation of neural population activity. I might have 1000 neurons (i.e. my data live in 1000-dimensional space) and want to project it on the three leading principal axes. What these axes are, is totally irrelevant for me, and I have no intention of "interpreting" these axes in any way. What I am interested, is the 3D projection (as the activity depends on time, I get a trajectory in this 3D space). So I am fine if each axis has all 1000 non-zero coefficients.
On the other hand, somebody might be working with more "tangible" data, where individual dimensions have obvious meaning (unlike individual neurons above). E.g. a dataset of various cars, where dimensions are anything from weight to price. In this case one might actually be interested in the leading principal axes themselves, because one might want to say something: look, the 1st principal axis corresponds to the "fanciness" of the car (I am totally making this up now). If the projection is sparse, such interpretations would generally be easier to give, because many variables will have $0$ coefficients and so are obviously irrelevant for this particular axis. In the case of standard PCA, one usually gets non-zero coefficients for all variables.
You can find more examples and some discussion of the latter case in the 2006 Sparse PCA paper by Zou et al. The difference between the former and the latter case, however, I did not see explicitly discussed anywhere (even though it probably was).
There is a paper PCA on a DataFrame that seems trying to solve this problem. The technique used here is called collectively Generalized Low Rank Models (PCA and Sparse PCA are examples of this family of methods).
If you are familiar with Python/R you can try to use GLRMs from H2O library. They can handle both categorical and continuous data in single row.
Best Answer
First off, be aware that the term "normalize" is ambiguous within statistical science. You apply it to scaling by (value $-$ mean) / standard deviation, which is commonly also described as standardization. But it is also often applied to transformations that produce versions of a variable that are more nearly normal (Gaussian) in distribution. Yet again, a further use is that of scaling to fit within a prescribed range, say $[0, 1]$.
Standardization itself does not affect how far a distribution is normal, as it is merely a linear transformation, and skewness and kurtosis (for example), and more generally all measures of distribution shape, remain as they were.
As for principal component analysis (PCA), prior standardization is common, indeed arguably essential, whenever the individual variables are measured using different units of measurement. Conversely, PCA without standardization can make sense so long as all variables are measured in the same units. The difference corresponds to basing PCA on the correlation matrix (prior standardization) and on the covariance matrix (no prior standardization). Without standardization, PCA results are inevitably dominated by the variables with highest variance; if that is desired (or at worst unproblematic), then you will not be troubled.
Other way round, all variables being standardized gives them all, broadly speaking, the same importance; and even that could be wrong, or not what you most want. For example, the variable with the least variance and that with the most will end up on the same scale and with equal weight. Only rarely does that match what a researcher most needs, although it can be hard to build in what is needed without subjectivity or circularity. In practice, PCA seems most successful when the input variables have a strong family resemblance and least successful when the researcher inputs a mishmash of quite different variables, as say different social, economic or demographic characteristics of countries or other political units. PCA is not a washing machine; the dirt is not removed, but just redistributed.
If skewness is very high, you have a choice. Often results will be clearer if PCA is applied to transformed variables. For example, the effects of outliers or extreme data points will often be muted when variables are transformed. Conversely, PCA as a transformation technique does not depend on, or assume, that any (let alone all) of the variables fed to it being normally distributed.
In abstraction, it is difficult to advise in detail, but it will often be sensible to apply PCA both to the original data when highly skewed and to transformed data, and then to report either or both results, depending on what is helpful scientifically or substantively.
PCA itself is indifferent to whether variables are transformed in the same way, or indeed to whether some variables are transformed and others are not. Whenever it makes sense, there is some appeal in transforming variables in the same way, but this is perhaps more a question of taste than of technique.
As a simple example, if several variables are all measures of size in some sense, then skewness is very likely. Transforming all variables by taking logarithms (so long as all values are positive) will then often be valuable as a precursor to PCA, but neither analysis should be thought of as "correct"; rather they give complementary views of the data.
Note 1: I rather doubt that you "have to" do PCA unless you are committed to some exercise as part of a course of study. It seems very likely that some kind of Poisson modelling would be closer to scientific goals and just as fruitful as PCA, but without detail on those goals that is a matter of speculation.
Note 2: In the case of positive integers, roots and logarithms both have merit as transformations. I note that you state that your data are Poisson distributed without showing any evidence.