Why Simulate?
Quick research into the problem will reveal that the optimal algorithm to solve this problem is to rank the first $\frac{1}{e}$ applicants, not hire them, and then choose the first candidate who is better than your previous ones. For example, if there are ten candidates each with unique rankings, you might observe the following sequence.
$$2 \quad 3 \quad 6 \quad 5 \quad 4 \quad 8 \quad 9 \quad 10 \quad 1 \quad 7$$
We observe that in the first $\lfloor \frac{1}{e} \rfloor$ candidates, or first 3 candidates, the max score was a six. So, we check the fourth candidate and observe a 5, so we don't hire. Then we observe a 4, so we don't hire. Then we observe a 9, so he hire them and halt the interview process. In this case, we didn't select the optimal candidate, but we were close.
But what if we don't know this algorithm and we want to determine which $k$ is best? It isn't immediately intuitive that $k \approx \frac{1}{e}$ is the best number of candidates to skip and so simulating can assist us in finding for which $k$ our probabilility of picking the best candidate is maximized. In the general case, simulations aid in intuition. If we can guess an optimal solution, or perhaps even prove an optimal solution through mathematics, then simulation helps to verify our results.
How would you simulate?
This is perhaps my favorite part. I will be using R to write this simulation and go through the steps on how to do this simulation.
We will start with a single simulation for arbitrary $k$, where we want to create 1000 candidates each with distinct rankings.
x <- sample(1:1000, 1000)
Now that we have our sample, we want to find the best candidate among the first $k$ candidates, which can be done with
init_best <- max(x[1:k])
Next, we begin by looping through the remaining candidates until we find one who is better than the best of our first $k$ candidates. Note that we only simuluate up to $n-1$ candidates for $k$, because it doesn't make sense to skip every candidate as that always results in no hire.
for (i in (k+1):999) {
if (x[i] > init_best) {
candidate_score <- x[i]
candidate_num <- i
break
}
}
So, from here, we have recorded the candidates score. We can quickly verify if this candidate is the best candidate by checking if their ranking is the max ranking. Since we have 1000 candidates, we do this with
if (candidate_score == 1000)
success = success + 1
That is, we record that we successfully chose the best candidate and increment some value that keeps track of that. Using these, we can write a few loops to run this simulation several times which is shown below
sims <- 10000
p <- c()
p[1] <- 0
cand_ave <- c()
cand_ave[1] <- 0
for (k in 2:999) {
success <- 0
for (n in 1:sims) {
x <- sample(1:1000, 1000)
init_best <- max(x[1:k])
candidate_score = -1
for (i in (k+1):1000) {
if (x[i] > init_best) {
candidate_score <- x[i]
break
}
}
if (candidate_score == 1000)
success <- success + 1
}
p[k] <- success/sims
cat(k, " complete \n")
}
plot(1:999, p, type = "l", xlab = "Candidates Skipped",
ylab = "Probability of selecting Best Candidate",
main = "The Secretary Problem")
So, we create a collection of probabilities so that later on, we can plot these values. We also initialize candidate_score at -1 before each loop so as to signify the case where we end up not hiring anyone. After running this simulation 10000 times for each $k$, the results are as follows
Within this simulation, we find that the $k$ value(s) that maximizes $p$ is
$k = 332$ or $k = 374$ with $p = .3789$.
We know that the asymptotic $k_n^*$ and $p_n^*$ values are $\dfrac{n}{e}$ and $\dfrac{1}{e}$ respectively and these results are fairly close to those values which would be in this case
$$k_{1000} \approx 369 \quad \quad p_{1000} \approx .369$$
So the simulation approximately confirms the asymptotic values.
It is often useful to simulate the power of a test. However, your question is unclear. [The figure does not match the distributions you say you simulate. The null and alternative hypotheses are unclear. You do not say whether
you are doing a one-sample or a two-sample t test. You do not indicate the sample size(s).]
Suppose you have random samples of sizes $n_1 = n_2 = 20$ from
the null distribution $\mathsf{Norm}(0,1)$ and the
specific alternative distribution $\mathsf{Norm}(1,1).$ Also, suppose
you will do a pooled 2-sample t test at level $\alpha = 0.05$
and wish to know the power of the test (probability of rejecting).
Here is one such 2-sample t test in R.
set.seed(1234)
x = rnorm(20, 0, 1)
y = rnorm(20, 1, 1)
t.test(x,y, var.eq=T)$p.val # P-value only
[1] 0.02451879
t.test(x,y, var.eq=T) # complete output
Two Sample t-test
data: x and y
t = -2.3421, df = 38, p-value = 0.02452
alternative hypothesis:
true difference in means is not equal to 0
95 percent confidence interval:
-1.25582408 -0.09136416
sample estimates:
mean of x mean of y
-0.2506641 0.4229301
In this one example, the null hypothesis $H_0: \mu_x=\mu_y$ is rejected
in favor of the two-sided alternative $H_0: \mu_x\ne\mu_y$
at the 5% level because the P-value $0.02452 < 0.05 = 5\%.$
Now we show a simulation of 100,000 such pooled 2-sample t tests. [A simulation with only 1000 iterations would give only a
very approximate answer.]
set.seed(2021)
pv = replicate(10^5, t.test(rnorm(20,0,1),
rnorm(20,1,1), var.eq=T)$p.val)
mean(pv <= 0.05) # aprx power of test at 5% level
[1] 0.86787
summary(pv)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000000 0.0003599 0.0029373 0.0284559 0.0180662 0.9986458
2*sd(pv <= 0.05)/sqrt(20^5)
[1] 0.000378604 # aprx 95% margin of simulation error
So the simulated power of the test at the 5% level
is $0.8679 \pm 0.0004,$ or about 87%.
Notes: (1) The numerical vector pv
contains $10^5$ P-values. The
logical vector pv <= 0.05
contains $10^5$ TRUE
s and FALSE
s. Its mean
is the proportion of its TRUE
s.
The last line of the R code gives a Wald 95% confidence
interval for that proportion.
(2) For this specific test one can use an exact
formula involving the non-central t distribution
to find an exact power value. However, the simulation
method shown here works in cases where no exact analytic
formula is available. (I would use simulation to
get the power of a Welch 2-sample t test, which does not assume equal variances. I have never seen an
exact formula for that test.)
(3) With only the first 1000 iterations, we get power $0.857\pm 0.022,$ which might be good enough.
set.seed(2021)
pv = replicate(1000, t.test(rnorm(20,0,1),
rnorm(20,1,1), var.eq=T)$p.val)
mean(pv <= 0.05)
[1] 0.857
2*sd(pv <= 0.05)/sqrt(1000)
[1] 0.02215163
Addendum on simulating significance levels, per Comments at end. You can use
simulation to check that the significance level
of a pooled t test designed to have a 5% critical value truly has significance level 5%. Why might
you need to check? Maybe you doubt the software is programmed correctly. Maybe you doubt the assumptions are met (random normal data? equal population variances?).
Here is a simulation to check that that the pooled t test truly has the 5% level (rejection rate when both
populations have the same normal distribution):
set.seed(2021)
pv = replicate(10^5, t.test(rnorm(20,0,1),
rnorm(20,0,1), var.eq=T)$p.val)
mean(pv <= 0.05)
[1] 0.04961
2*sd(pv <= 0.05)/sqrt(10^5)
[1] 0.001373307
Significance level is $0.0496\pm 0.0014$ (so very near 5%).
However, if population variances are not equal, then
the pooled 2-sample t test does not have the 'advertised' 5% rejection rate when population means are equal $(H_0:\mu_x=\mu_y).$ This can lead to 'false discovery'--especially if sample sizes are unequal and the smaller sample is from the population with the
larger variance.
pv = replicate(10^5, t.test(rnorm(40,0,1),
rnorm(15,0,2), var.eq=T)$p.val)
mean(pv <= 0.05)
[1] 0.14371
The actual significance level is about 14%, not 5%.
So we might reject when population means are truly equal, if population variances are not equal. That is why it is recommended to use the Welch t test if
there is any doubt at all about the equal variances
assumption. In R you can get the Welch 2-sample t test, which does not assume equal population variances, by using the default 2-sample t test (omit parameter 'vareq=T').
The simulation below, shows that a Welch t test at
the nominal 5% level gives a true significance level
near 5% $(0.0515\pm 0.0014),$ even when population variances differ (by 1:4 ratio).
set.seed(2021)
pv.w = replicate(10^5, t.test(rnorm(40,0,1),
rnorm(15,0,2))$p.val)
mean(pv.w <= 0.05)
[1] 0.0515
2*sd(pv.w <= 0.05)/sqrt(10^5)
[1] 0.00139783
Best Answer
The function
sim()
randomly selects numbers are randomly chosen from a bivariate normal distribution with the specified vector of means and variance-covariance matrix in order to construct confidence intervals around the parameter estimates.The vector of means and the covariance matrix are determined by the ML estimates of the model parameters. Therefore, these are simulations of parameter values. Recall that the ML estimates of parameters are just the most likely values. The true value might not actually be the maximum of the likelihood function given the data because of sampling error -- this is the motivation for finding confidence sets: to define a region with high probability of containing the true value.
To understand this topic more fully, you should familiarize yourself with probability density functions generally and the normal distribution and then the bivariate normal specifically. The latter extends the normal from the real line to a real plane, and so is useful in cases when two random variables can vary together.