Non-parametric tests are likely to be less powerful than parametric tests and thus require a larger sample size. This is annoying because if you had a large sample size, sample means would be approximately normally distributed by the central limit theorem, and you thus wouldn't need non-parametric tests.
Look at generalized linear models, of which least squares and Poisson are special cases. I've never found a text that explains this particularly well; try talking to someone about it.
Look at non-parametric methods if you feel like it, but I have a hunch that they won't help you much in this case unless you're using ordinal data or a large set of very bizarrely distributed data.
The data were collected using Likert scales in a questionnaire.
From the moment you decided to use Likert scales, the issue of whether or not the data were actually from a normal distribution was decided (in the negative). It's pointless to test (and answer with chance of error) a question to which you already know the answer with certainty (a large enough sample would always lead to rejection by a suitable test, but you can tell it is the case with no data at all). Your data are not from a normal distribution normal; that was already certain.
[However, it's also not a useful question to answer; a better question to answer is not 'are the data from a normal distribution?' (are they ever?) but 'how much impact does it have on my inference?', a question not answered by hypothesis tests.]
So now I am unsure whether or not I can use a parametric test
That depends on the parametric test. Parametric doesn't necessarily imply "normal"; you may be able to make some other distributional assumption that will be consistent enough with your situation that you would be content with the impact of whatever deviation from assumptions you have.
or whether I need to use non-parametric tests.
Beware - nonparametric tests also have assumptions, and in some cases may be somewhat sensitive to them.
Many nonparametric tests assume continuous data, for example, and if you don't account for heavy discreteness you may get tests with quite different properties from their nominal ones. Some assume symmetry. In addition, suitability of particular tests may depend on the precise hypothesis you're interested in - you may need some additional assumptions (or perhaps a somewhat different nonparametric procedure) to get a test of your actual hypothesis.
I have found books stating that if you have a small n, you should always use non-parametric tests.
For very small $n$, that's not necessarily useful advice, since you may have no useful significance levels available to you. At larger (but still small) $n$, in cases where the assumptions of a suitable nonparametric procedure are tenable, it sometimes makes sense to avoid making parametric assumptions to which your inferences may be sensitive (though there's sometimes the possibility of choosing different, less sensitive procedures).
However I have also found citations
Which citations? What did they actually say?
stating that the choice between parametric and non-parametric tests depends on the level of your data (Likert can be seen as nominal)
No. A single item intended as part of a Likert scale is at least ordinal. If you have constructed a Likert scale by adding a number of such questions you have already assumed it was interval - by assuming things like '5'+'2' = '4'+'3' (which must be the case if you're able to add the scores and treat every '7' as the same), every component item had to have been interval. If they're interval, their sum certainly is.
so I should use parametric tests.
I don't see how "use a parametric test" follows from that.
You say very little about what kind of hypotheses you have (what are you trying to find out?); more might be said in those circumstances.
Best Answer
The raw data isn't assumed to be normally distributed in two way ANOVA.
(Something is assumed to be normal, but it's not the data. At least, not unconditionally. What did you check for normality, and how?)
What transformation has done is made your comparison no longer a comparison of means.
On the other hand, ANOVA isn't particularly sensitive to mild non-normality, and the larger the samples, the more non-normality it can tolerate.
You have no basis on which to assert that the transformed variable is normal. It might look normal, but that doesn't mean it is. (On the other hand, you can tolerate approximate normality, so this error isn't of much consequence.)
Possibly either, possibly neither. What is the required null and alternative? What assumptions are you prepared to make?
The nonparametric test is not a test of equality of means, for example, unless you add some assumptions (such as a location-shift alternative).
If the variances are close to equal, the t-test is reasonably robust to mild/moderate non-normality. (And if you use the Welch approximation, the t-test deals pretty well with unequal variance)
Aside from those two, and a t-test after a transformation, another possibility is to perform a permutation test.
You really need to give more detail about the specific hypotheses you wish to consider.
As a piece of general advice, either the Welch t-test or the Wilcoxon-Mann-Whitney might be reasonable, but there's presently not enough information to suggest leaning toward one or the other, or indeed something else.