A way to improve upon ANOVA with an ordinal predictor is to use dummy codes in penalized regression. Penalized regression takes advantage of the ordering among the response categories in Likert scale data: it reduces overfitting by smoothing differences in slope coefficients for dummy variables corresponding to adjacent ranks. See Gertheiss and Tutz (2009) for an overview of penalized regression for ordinal predictors. I've discussed penalized regression here a few times before:
Unfortunately, this approach probably won't do well for categories with very few observations, and I don't know of any that would. Inferential power is necessarily limited with samples so unbalanced as to have very few observations in groups of interest. Correcting for familywise error would raise the bar even further out of reach, even if that's where it belongs. Whether it is depends on whether you mean to test one big hypothesis several times on separate but related measures, or whether you want to evaluate each hypothesis test separately; familywise error adjustment isn't necessary for the latter.
If you can't collect more data, might as well give the test a shot, but give some thought to the degree of evidence you want to see. You probably won't have enough power to distinguish small differences from the null hypothesis with p < .05, so using the Neyman–Pearson framework for dichotomizing p values interpretively is probably unrealistic (more so than usual, that is). There are less polarized ways of understanding p values – one might also call them more equivocal ways, but that's probably more appropriate with relatively weak evidence anyway. For more on interpreting p values, see for instance:
The recommendation to focus on effect size estimation and confidence intervals may help here too, because it is in essence a recommendation to focus on what you can know from your data, even if it's not "enough to reject" a null hypothesis. Plotting your results may help give you a sense of what's really going on too. Don't feel that confirmatory hypothesis testing is your only option unless you have good reason to; you may be able to get some good ideas of hypotheses to test further by just exploring your data, even if you can't really conclude anything very firmly from what you have.
One last option to consider is treating your Likert scale data as continuous. This is a big assumption (that you're effectively already making with regard to your Variable 2), so keep it in mind when interpreting anything you do based on that...but it would allow you to compute correlations between each item's ratings and your Variable 2. In this case especially, you'd not want to collapse the dis
/agree
and strongly dis
/agree
categories. Also bear in mind that a t-test of a correlation assumes bivariate normality, so you might want to consider alternatives for any hypothesis tests on those effect size estimates as well.
Reference
Gertheiss, J., & Tutz, G. (2009). Penalized regression with ordinal predictors. International Statistical Review, 77(3), 345–365. Retrieved from http://epub.ub.uni-muenchen.de/2100/1/tr015.pdf.
There are two ways you can take: (1) just use the sums of scores, (2) use an Item Response Theory (IRT) based method. Using sums of raw scores is very common in social sciences but many psychometricians do not consider it being a sound approach. If you sum up the different questions from the questionnaire you assume that every answer provides you with the same amount of information - and in the real life that is not true. However, your data provides you in information on both the "abilities" of your responders and on precision of your questions, so that you can use both sources of information to gain deeper understanding of both your questionnaire and your responders. This is a pretty wide topic so you can check different resources on this topic, e.g. here, here or in this book. IRT will let you to use your data to obtain information on latent features measured by the questionnaires on continuous $Normal(0, 1)$ scale, so it also makes life easier with further analysis. It is mostly used in the area of educational research, so don't get discouraged that most examples in the books and articles are on measuring student abilities, because the method could be used for analyzing any kind of test or questionnaire data to obtain the latent profiles of the responders.
There are many statistical packages for IRT, for example, in R you can use mirt or ltm.
Best Answer
Data are (almost) never normal. Whether that's an issue depends what forms of deviation from normality the procedure you want to use is sensitive to (and how much), how non-normal it is and in what way it's non-normal (strictly we're talking about the distribution the sample was drawn from rather than the sample itself).
Where there's much doubt about the potential impact, try to avoid assuming things you don't need to.
They're discrete and bounded, normal distributions are not.
"They're discrete and bounded, normal distributions are not." is 100% of the literature you should need ... it's simple mathematical fact, instantly apparent from looking at the definition of the normal density and a Likert scale that anyone could check for themselves. Would you really quote literature to support a plain statement of fact like "3 is not an even number"?
What is assumed normal in SEM and does that imply marginal normality of every variable? (My understanding is that the errors might be assumed MVN but that doesn't imply marginal normality of all variables, so it wouldn't automatically suggest transformation of variables that don't look normal on their own.
Note that "normalize" carries a more common particular meaning (or actually a couple) distinct from "transform to normal distributions" (see the tag wiki for normalization for a brief explanation)