In order to calculate Spearman Correlation Coefficient, the data should be ranked. However, many people do this in different way. Some sort them like an increasing sequence (i.e the smallest number has rank 1 and the greatest has rank $n$), others do this in an opposite way, they give the highest rank to the smallest number and rank 1 to the greatest. Can you suggest what is the most appropriate way to do that?
Correct ranking for Spearman Correlation
correlationstatistics
Related Solutions
If I recall right, then the function $\small \tanh^{-1}(r) $ where r is the pearson correlation coefficient is distributed normally if the correlated data are normally distributed too, so you can compute confidence intervals based on this (the coefficient r has range $\small -1 \ldots 1 $ and this range gets stretched to $\small -\infty \ldots \infty $ by the $\small \tanh^{-1}$ - transformation). I've seen this been discussed much intensely by James Steiger, but it's long time ago and I cannot give a reference at the moment (surely this should also be mentioned in the wikipedia). HTH anyway.
To correctly answer this interesting question, there are three issues to be considered. The first one refers to the opportunity of weighting. The problem of exploring the relationship between two variables by taking into account a third weighting variable is common in statistical research. For example, we could be interested in assessing the correlation between age and the value of a certain blood parameter in a sample of subjects where the blood parameter value in some of them represents the average of multiple measurements. In this case we could choose to give more importance to values representing averages than those representing single measurements, under the hypothesis that they are less affected by within-subject variability and can be considered more "reliable". The size or number of observations is not the only possible weighting variable: we can decide to weight, say, according to time of observation (e.g., if we want to give more importance to recent observations than old ones because are more relevant for the present situation), to the standard deviation of values (as correctly noted in the comments) in samples with aggregated data, to the order of preferences when one of the variables is a rank, and so on.
In the context described by the OP, considering the size discrepancies, a weighted analysis is fully appropriate. The utility of this choice is also highlighted by the type of variables considered in this case (proportions), since their precision is well known to be highly sensitive to small sample sizes. This concept is a classical problem in power calculations for studies on proportions, and we can better visualize it by considering that the sample size required to estimate a proportion with a specified level of confidence and precision is given by the formula $\displaystyle N=\frac{Z_\alpha^2 p(1-p)}{e^2}$, where $Z_\alpha$ is the value from the standard normal distribution corresponding to our predefined $\alpha$ error (e.g., Z=1.96 if we want a 95% CI), $p$ is the expected "true" proportion in the underlying population, and $e$ is the desired level of precision. As a result of this inverse relation, small sample sizes can be associated with very high levels of imprecision. For example, let us consider to get a sample from a population where the true underlying proportion is $50\%$, and to observe a proportion $p$. The precision of this observed proportion for an $\alpha<0.05$ (i.e. the range over which $p$ is distributed 95% of times if I take infinite samples of that size), is $\pm5\%$ for a sample of $385$ observations, but falls to $\pm10\%$ for a sample of $97$ observations and to $\pm20\%$ (clearly unacceptable) for a sample of $25$ observations. These considerations point out that caution is required when managing proportions given by small sample sizes. In our case, this problem is more evident for gems, since half of tombs has a size $<10$. In these conditions, weighting is clearly recommended.
The second issue is that we have to choose the method of weighting. As stated above, weighting can be performed according to different variables, the choice of which depends on several factors, including purpose of the study, underlying distribution, type of data aggregation, and so on. In our case, we are interested in finding a weighting variable that impacts on the reliability of observed proportions. According to the considerations above, and taking into account the marked impact of sample size on proportion precision, the size of each observation (in our case, the number of gems and that of coins in each tomb) is an appropriate choice. Weighting by standard deviation, which is correctly performed in many cases of aggregated data, is less appropriate in this context, since here we have no aggregated data (also, even if we had aggregated data, we could not assume that the distribution of observed data in the tombs is normal). To quantify the size of each tomb, the geometric mean of the number of gems and coins is the optimal choice and has to be preferred to the arithmetic mean. In fact, the geometric mean takes better into account that, to be reliable, an observation must have a precise proportion of both gems and coins, and that therefore a balance between the two elements is advantageous for the purpose of our analysis. To better explain this: if, for instance, we have a tomb $i$ with $2$ gems and $198$ coins, and another tomb $j$ with $100$ gems and $100$ coins, the overall reliability of the observation $x_i,y_i$ (where $x$ and $y$ are the proportions of gems and coins, respectively) is probably inferior to that of the observation $x_j,y_j$. The geometric mean captures this information and gives a size of $19.9$ in the first case and of $100$ in the second case. The arithmetic mean does not capture this information and gives a size of $100$ in both cases.
The third issue is that we have to identify the most appropriate method to assess correlation. In this regard, the most important choice to make is between parametric and nonparametric measures. Several assumptions must be satisfied before applying the classical Pearson correlation, which is the typical parametric test: 1) variables must be continuous; 2) variables must be approximately normally distributed; 3) outliers (observations that lie at an abnormal distance from the other data) have to be minimized or removed; 4) data have to be homoscedastic (i.e., the variances along the line of fit have to be approximately similar as we move along the line; 5) a linear relationship must be plausible (this is usually checked by visual estimation of scatterplots). We can use specific tests to check these assumptions, but looking at the data shown in the OP it seems highly unlikely that all them are adequately satisfied. This suggests that nonparametric measures of correlation have to be preferred in this case.
The most used types of nonparametric correlation coefficients are Spearman's R, Kendall's Tau, and Goodman-Kruskal's Gamma. All these methods overcome the problems related to the assumptions of parametric tests, since they only require that individual observations can be ranked into two ordered series. Spearman's R can be interpreted as the Pearson correlation coefficient computed from ranks, so that it provides a similar message in terms of variability accounted for. Kendall's Tau is equivalent to Spearman R in terms of statistical power, but its results have a different interpretation since it represents a probability: in particular, it is the difference between the probability that, given any pair of observations ($x_i, y_i$ and $x_j, y_j$), the ranks of the two variables are in the same order (i.e., $x_i>x_j$ and $y_i>y_j$, or $x_i<x_j$ and $y_i<y_j$). Goodman-Kruskal's Gamma is basically equal to Kendall's tau, with the only difference that it takes into account ties (observations with identical value), and is preferable when data show several cases of equal values.
In summary, an optimal choice for this analysis could be a nonparametric test (e.g. the Spearman's R) weighted for size, where size is calculated as the geometric mean of the number of gems and that of coins. I have not tested whether this analysis, applied to the tomb data, yields a significant correlation. However, this analysis surely represents a very "robust" approach.
Best Answer
Fake data simulated in R for purposes of demonstration.
A scatterplot shows positive, but not entirely linear, association.
Notice that Pearson and Spearman correlation differ. Roughly speaking, Pearson correlation measures the linear component of the association. The Pearson correlation $r = 0.948$ shows substantial, but not perfect, linear association.
By contrast, each increase in $x$ is accompanied by an increase in $y.$ This leads to a Spearman correlation $r_S = 1.$
As you say, the Spearman correlation is based on ranks. Notice that $x$'s and $y$'s have ranks that match exactly. This is another way of saying that each increase in $x$ is accompanied by an increase in $y.$
Notice that rank 1 for the $x$'s corresponds to the minimum $x$-value 58.05, and rank 1 for the $y$'s corresponds to the minimum $y$-value 11,357. Similarly, rank 15 corresponds to the maximum of each variable.
The Spearman correlation can be found by taking the Pearson correlation of the ranks.
The Wikipedia article of Spearman correlation has some nice examples.