First of all, you should do a scatter plot of the projection of your individuals on the first two PCs. If instead of seeing a single ellipsis, they cluster in different groups, you'll find an easy interpretation of your data.
If they fall in a single elipsis, you can interpret the PCA as giving you low dimensional (approximate) models of your data.
If you decide to keep only the first PC, you consider that the individuals are roughly distributed along one axis (the long axis of the mentioned ellipsis), given by this PC. In your case, you can interpret this axis as a "good performer/bad performer" axis. As all your loadings have similar values, this means that you consider that a typical individual will have similar scores at all five tests, and the coordinate of an individual on this axis is approximately proportional to its mean score.
If you decide to keep the two first PCs, you consider that they are distributed in a plane. The first axis is as previously; the second axis, orthogonal to the first one, materializes the differences between people with the same coordinate on the first axis. In you case, this means that among people with similar scores, some are more of the muscular type, the others are more intellectuals.
The decision to keep one, two, or more PCs, to give a good description of your data, should rely in particular on the eigenvalues associated to the PCs (or on the proportion of explained variance...).
If there is no specific reason to inspect how the variables are projected to each latent variable, why not go with the regression coefficients?
I find biplot useless when there are many latent variables(LVs) in PLS model and when their contributions to the model are somewhat close. By doing biplot you simply ignore all LVs after the 2nd LV and this is not desirable for most cases. In addition drawing, for instance, bar plot for 10 different loading doesn't sound good neither. Since regression coefficients combine all LVs, observation of them makes more sense to me.
If I understand you correctly, the loading correlation can be problematic too. Assume a case where there are 3 variables which are very correlated but also important for regression. Then you may get small loading values(regardless of their sign) for each of them and this may deceive you that they are negligible.
The question about signs is a good one. Generally not the sign but the magnitude of a variable's loading matters most. For 1 LV, a variable can negatively contribute to the dependent variable and in other LV positively. Inspecting total effect is, as I mentioned, is possible by looking at regression coefficients.
BTW, these are all my personal opinions and you may want something different from your data and you might want to inspect loadings of each LV. For example LVs obtained from a NIR spectrum may correspond to the spesific compounds in a solution etc.
Edit: According to the comments OP needs a more clear answer:
There are 2 main algorithms for PLS regression: The older and original one is NIPALS and there is the newer one called SIMPLS which is faster and provides more interpretable results. I believe your confusion lies in the differences among them and I was once quite confused until I went through the original papers.
In NIPALS, the vector $b$ is obtained for each component(latent variable) and it is the regression coefficients between X scores and Y scores. Since building a model with NIPALS involves deflation of X and Y matrices, prediction takes place by going in the reverse way and building Y block component by component.
In SIMPLS, the authors figured out that it is possible to come up with a model that is equivalent with NIPALS(at least for PLS1 case) but without deflation so the obtained matrices can be applied on the X directly. This includes the prediction of Y with a single $b$ that is the regression coefficents to be used on X to predict Y directly.
PLS1(1 refers to having a single column of Y) with SIMPLS:
- m: number of observations
- n: number of variables
- a: number of components(LVs)
$\mathbf T_{(m \times a)} = \mathbf X_{(m \times n)} \mathbf R_{(n \times a)} $ where T is your X scores for a components, R is the weights for a components
$\mathbf {\hat Y_{(m \times 1)}} = \mathbf T_{(m \times a)} \mathbf Q'_{(1 \times a)}$ This is the prediction step.
Therefore you can define the regression coefficents:
$\mathbf B_{(n \times 1)} = \mathbf R_{(n \times a)} \mathbf Q'_{(1 \times a)}$
So the prediction can be done with a single regression matrix B
$\mathbf {\hat Y_{(m \times 1)}} = \mathbf X_{(m \times n)} \mathbf B_{(n \times 1)} $
As you can see the size of B is independent of how many components you choose. So, to summarize, if you go with SIMPLS which I highly recommend, the regression coefficents I was talking about is the B in the above equation and can be used directly on the X matrix thus you can use their values for interpretation without dealing with loadings.
EDIT 2: I have encountered some R libraries which reports regression coefficents for multiple components. In that case, if SIMPLS algorithm is used, the regression coefficent for n components is the nth column of that matrix. You can test them by multiplying the X with that column to see whether you obtain the same prediction results (do not forget to account for mean centering).
Here are the papers for NIPALS and SIMPLS, respectively:
Geladi, Paul, and Bruce R. Kowalski. "Partial least-squares regression: a tutorial." Analytica chimica acta 185 (1986): 1-17.
De Jong, Sijmen. "SIMPLS: an alternative approach to partial least squares regression." Chemometrics and intelligent laboratory systems 18, no. 3 (1993): 251-263.
Sorry for the long answer.
Best Answer
Although you can store all scores in variables, you cannot display the weights for all of them. But as they are important for a meaningful interpretation of the components, you could use the generated variables containing the score to get back on the weights.