Hypothesis Testing – Testing Whether a Sample of Population Has Different Mean Than Population Mean

hypothesis testingmeanprobabilitysamplet-test

I have a short question regarding testing whether multiple subsample means are different from the population mean. To be specific, I have the full population of the energy consumption per square meters of $N$ houses where $N = N_1 + N_2 + N_3 + N_4 + N_5$. The categories $1,\dots,5$ denote the house category independent of insulation properties of the houses. The general question I am interested is: does the mean energy consumption in any category differ to the full population mean energy consumption?

I know from this previous question that usually the way to test whether the mean of a subsample (in my case a category) is different to the full sample mean, is to substract the subsample data points from the full sample and conduct a t-test.

As far as I understand it, this procedure is generalizeable…i.e. if I want to test the difference in means of say $N_x$ to $N$ where $x \in \{1,\dots,5\} $, and $x$ denotes a category, I just take $\{N\} – \{N_x\}$ and conduct the t-test.

So far so good, the points I am interested in are:

  1. whether it makes a difference to have the full population instead of a sample?
  2. Is the explained procedure appropriate/the right one to compare category means to the full population mean?
  3. The data in the respective categories comes from the population distribution, so when performing the t.test in R I would use var.equal = TRUE as an additional argument?
  4. Regarding the normality assumption: do I need to test whether the distribution of every category $x \in \{1,\dots,5\} $ and $\{N\} – \{N_x\}$, which denotes the set difference of the population and the category, is significantly different to a normal distribution?

Thank you.

If anything is unclear please let me know and I try to edit the question to clarify.

Best Answer

You have the full population, so you are in the enviable position of just being able to look at the numbers.

If the mean consumption for the whole population is $70$ and the mean consumption for some subpopulation is $70$, then they are the same. If that subpopulation has a mean other than $70$, then they are different.

Because you have the full population, it's really that easy. Inferential ideas like p-values and confidence intervals are for when we have incomplete information (sample) and need to guess something about the full information (population).

Related Question