By definition, ordinal scale is the gauge wherein the true distances between notches 1 2 3 4
is unknown. It is like you are seening a ruler under drugs/alcohol. The true distances can be any. It could be 1 2 3 4
or 1 2 3 4
or whatever. We cannot compute a statistic - such as a correlation - unless decide on the distances, fix them.
One reasoning can be as follows. Since our measuring scale, the gauge, is distorted in an unknown monotonic way, we cannot believe in data values. Only the order of their magnitudes is trustworthy. Without further harness of brain, declare order to be the value. Thus, we replace the observed distribution by uniform distribution, the ranks. After that, may compute association coefficient, say, Pearson $r$. That will be Spearman $rho$, as we know. Pearson $r$ measures the strength of linear association. Ranking the variables was a trick to linearize that portion of monotonic relationship that is attributed to the distributions not having been uniform initially. Thus, Spearman $rho$ is the measure of such monotonicity in the relationship which can be converted to linearity under the action of uniforming the marginal distributions. In the OP question, only one of the two variables is ordinal (and the second is continuous). So, there is no need, generally, to rank both variables. May just rank the ordinal one and then compute $r$.
Another approach, alternative to ranking (uniforming), may be optimal scaling of the ordinal variable. Optimal scaling is an iterative procedure with the goal to find such distances on the ordinal scale - i.e. find such monotonic transformation of it - so that linear $r$ between the variables is maximized as possible. While ranking approach is based on premise "true scale corresponds to data having uniform distribution", optimal scaling approach is based on premise "true scale corresponds to data having maximal linear $r$". Optimal scaling can be done in categorical regression (CATREG). However, categorical regression requires that the other input variable be discrete (not necessarily ordinal) and so if it is continuous having many unique values it will have to be arbitrarily binned by you.
There are other approaches as well. But in any way, we transform the ordinal scale monotonically "so as to..." (some assumption or some goal), because ordinal scale is distorted to us in an unknown way. Radically another decision would be to "sober up" first and decide that it is either not distorted (i.e. it is interval), or distorted in a known way (is nonequiinterval), or is nominal.
Some asymmetric approaches may include ordinal regression of the ordinal variable by the other (interval/continuous) one. Or linear regression of that latter by the ordinal one, with the model where the predictor is taken as polynomial contrast (that is, entered as b1X + b2X^2 + b3X^3,...
). The weakness of these approaches is that they are asymmetric: one variable is dependent, the other is independent.
The dunn.test
package in R uses a one-sided test, whereas SPSS and GraphPad use two-sided tests. There is no facility in the dunn.test
package or its function dunn.test()
to change to a two-sided test, but the p-values can be multiplied by 2 if a two-sided test is required.
A two-sided Dunn's test is available from the dunnTest()
function in package FSA
(Fisheries Stock Assessment). This package is not available from CRAN, but can be downloaded by running the code source("http://www.rforge.net/FSA/InstallFSA.R")
. It requires an R version more recent than 3.0.2, and I had trouble installing it until I updated the Rcpp
package from CRAN. More information on FSA can be found on https://fishr.wordpress.com/fsa/, and documentation on the dunnTest()
function can be found on http://www.rforge.net/doc/packages/FSA/dunnTest.html.
Thanks to Stephan Kolassa for his help in resolving this problem.
Best Answer
Eta is about the proportion of variance explained. If you have an ordinal outcome, you don't have a variance, so I'd say no.
Here's some more explanation. Variance is about how different the scores are. So:
1.1, 1.2, 1.3 has a small difference , hence a large variance.
1, 101, 201 has larger differences, hence larger variance.
1, 2, 10001 has even larger differences, and so the variance is even larger.
But in an ordinal measure, we don't know about differences - all we know about is the order, so for each of those variables, they go in the order 1, 2, 3. The classic example is position in a race - people came first, second, third, all we know about the winner is that they were ahead of whoever came second. Where they 0.1 seconds ahead, or 3 hours ahead. We don't know. So the times could be: 10, 11, 12
Or 10, 100, 101
Or 10, 1000, 1001.
We don't have knowledge of differences, so we don't have variance, so we can't have eta-squared.
You should (possibly) use some form of ordinal logistic regression, then you have options for effect sizes based on likelihood ratios, and/or classification probabilities.