Normal Distribution – Determining the Distribution of Additional Sample $x_{n+1}$ Given Sample Mean and Variance

chi-squared-distributionmeannormal distributiont-distribution

Suppose I draw $n$ samples $x_1…x_n$ of a random variable $x$, which is normally distributed with an unknown mean $\mu$ and variance $\sigma^2$. From those samples, I compute a sample mean $\bar x$ and a sample variance $s^2$. I wish to compute the distribution of $x_{n+1} \mid \bar x, s^2$, i.e. the distribution of one additional sample $x_{n+1}$, given my measured $\bar x$ and $s^2$.

How should I proceed? This is probably a bad way of proceeding with loads of issues, but here is my thought process so far:

  1. Start with the fact that $\frac{\bar{x} – \mu}{s/\sqrt{n}}$ is $t$-distributed, and then use that to find a likelihood of $\mu$ given $\bar x$.
  2. Use the fact that $\frac{x_{n+1} – \mu}{\sigma} \sim \mathcal N(0,1)$.
  3. Use the fact that $\frac{(n-1)s^2}{\sigma^2} \sim \chi^2_{n-1}$
  4. Combine these facts to get a distribution of $x_{n+1}$.

An alternative approach I thought of goes as follows:

  1. $x_{n+1}\sim N(\mu, \sigma^2)$
  2. $\bar x \sim N(\mu, \frac{\sigma^2}{n})$
  3. $(x_{n+1} – \bar x) \sim N(0, \sigma^2 + \frac{\sigma^2}{n})$
  4. Find the distribution of $\sigma^2 \mid s^2$ using the fact that $\frac{(n-1)s^2}{\sigma^2} \sim \chi^2_{n-1}$
  5. Integrate $N(0, \sigma^2 + \frac{\sigma^2}{n})$ over all possible $\sigma^2$ weighted by the distribution of $\sigma^2 \mid s^2$

Best Answer

The distribution of $x_{n+1}$ conditional on $\overline x=(x_1 +\cdots +x_n)/n$ and $s^2 = \big((x_1-\overline x)^2+\cdots+(x_n-\overline x)^2\big)/(n-1)$ is $\operatorname N(\mu,\sigma^2),$ since the observations $x_1,\ldots,x_n,x_{n+1}$ are independent. Thus I suspect that what you actually want is the distribution of $(x_{n+1}-\overline x)/s,$ upon which you can base a prediction interval whose endpoints are $\overline x\pm c\cdot s_n,$ where $c$ is chosen so as to get a desired probability that $x_{n+1}$ is in the interval.

Notice that $x_{n+1}-\overline x\sim\operatorname N\left(0,\sigma^2\left( 1 + \frac 1 n \right)\right),$ and since $s^2$ is independent of $\overline x$ and of $x_{n+1},$ and $(n-1)s^2/\sigma^2\sim\chi^2_{n-1},$ you have $$ \frac{(x_{n+1}-\overline x)/\sqrt{1+\frac 1 n}}{s/\sqrt n} \sim t_{n-1}. $$ Thus if you choose $c$ so that $\Pr(-c<t_{n-1}<+c)= 1-\alpha,$ then you have $$ \Pr\left(x_{n+1} \text{ is between } \overline x \pm c\cdot\frac s {\sqrt n}\cdot\sqrt{1+\tfrac 1 n} \right) = 1-\alpha. $$ This is a prediction interval for $x_{n+1}.$

Related Question