A way to improve upon ANOVA with an ordinal predictor is to use dummy codes in penalized regression. Penalized regression takes advantage of the ordering among the response categories in Likert scale data: it reduces overfitting by smoothing differences in slope coefficients for dummy variables corresponding to adjacent ranks. See Gertheiss and Tutz (2009) for an overview of penalized regression for ordinal predictors. I've discussed penalized regression here a few times before:
Unfortunately, this approach probably won't do well for categories with very few observations, and I don't know of any that would. Inferential power is necessarily limited with samples so unbalanced as to have very few observations in groups of interest. Correcting for familywise error would raise the bar even further out of reach, even if that's where it belongs. Whether it is depends on whether you mean to test one big hypothesis several times on separate but related measures, or whether you want to evaluate each hypothesis test separately; familywise error adjustment isn't necessary for the latter.
If you can't collect more data, might as well give the test a shot, but give some thought to the degree of evidence you want to see. You probably won't have enough power to distinguish small differences from the null hypothesis with p < .05, so using the Neyman–Pearson framework for dichotomizing p values interpretively is probably unrealistic (more so than usual, that is). There are less polarized ways of understanding p values – one might also call them more equivocal ways, but that's probably more appropriate with relatively weak evidence anyway. For more on interpreting p values, see for instance:
The recommendation to focus on effect size estimation and confidence intervals may help here too, because it is in essence a recommendation to focus on what you can know from your data, even if it's not "enough to reject" a null hypothesis. Plotting your results may help give you a sense of what's really going on too. Don't feel that confirmatory hypothesis testing is your only option unless you have good reason to; you may be able to get some good ideas of hypotheses to test further by just exploring your data, even if you can't really conclude anything very firmly from what you have.
One last option to consider is treating your Likert scale data as continuous. This is a big assumption (that you're effectively already making with regard to your Variable 2), so keep it in mind when interpreting anything you do based on that...but it would allow you to compute correlations between each item's ratings and your Variable 2. In this case especially, you'd not want to collapse the dis
/agree
and strongly dis
/agree
categories. Also bear in mind that a t-test of a correlation assumes bivariate normality, so you might want to consider alternatives for any hypothesis tests on those effect size estimates as well.
Reference
Gertheiss, J., & Tutz, G. (2009). Penalized regression with ordinal predictors. International Statistical Review, 77(3), 345–365. Retrieved from http://epub.ub.uni-muenchen.de/2100/1/tr015.pdf.
Best Answer
I find the mean to be a much more useful indicator of central tendency of Likert items than the median. I have elaborated on my argument here on a question asking about whether to use the mean or median for likert items.
A recap of some of these reasons:
1 1 3 3 3
is the same as3 3 3 5 5
(i.e., 3) but the mean reflects the difference.500 * 4 + 1 = 2001
different values)0 and 1
. Treating a 5 point likert scale as1 2 3 4 5
seems almost as natural to me.If you are persuaded that the mean is the appropriate measure of central tendency, then you would want to structure your hypothesis tests so that they test for differences between means. A paired sample t-test would allow for a pair-wise comparison of means, but there would be issues around the accuracy of p-values given the discrete and non-normal error distribution. Nonetheless, adopting a non-parametric approach is not a solution, because it changes the hypothesis.
I would expect that the paired sample t-test would be fairly robust at least for typicaly Likert item means that avoid either extreme on the scale, but I don't have any simulation studies on hand.