Solved – Correlation with very different sample sizes


I am new to statistics (self-taught since I need it for my thesis) and I have a question that I haven't been able to find an answer to yet. For my thesis I want to find out whether there is any relationship between how students and their teachers answer a likert scale questionnaire (several items forming a scale). I want to see whether there are any similarities/if teachers influence their students thinking in certain areas. I have questionnaires from 3 teachers and their 160 students. The questionnaire I used is based on an existing one but it has never been analysed in this way. What calculation or correlational analysis would best answer my research question?

I am very thankful for any tips or answers!

Best Answer

What I would do:

  1. Group the students according to the teachers and check for correlations to each answer.
  2. Check if this correlates with the teachers answer.


    • You could take the average value of the student's answers (or their sample median, or 25% quantile, ...) and compare those to their teacher's answers. Doing so, you get two datasets per class: one for the teacher and one (averaged) for the students. These you can easily correlated.
    • If you want to take it one step further, you could weight the calculated correlations (from above) by the students variance.
  3. Check if this correlates with the other student's answer -- a certain topic might be sexy, indep. of the teacher.

With your dataset it will be impossible to insure that your conclusions are valid for the teachers population, because you can't assume that the three teachers are representative for the whole population. However, if you have more than 20 questions per sheet, the evaluation whether or not there exists a correlation between the answers of these particular teachers and students might be possible.

EDIT: In my experience, one should not take all student's and teacher's answers on evaluation sheets too seriously. Questions are interpreted and the answers do strongly depend on the current mood of the person. Therefore, it might be wise to evaluate only those questions which invoked extrem answers (bad and good) in a second "interpretation" of the data.