edit – more information about what the code given should represent
The following pseudocode outlines the problem as I have it
for each random seed in S
randomise the data
for k in 1 to 5
create test / training data
fit the model to the training data
generate score
Therefore I will have $S * 5$ individual accuracy scores. My end score is an
average of these for which I would like to know the standard deviation.
original post
The following code represents my problem :
# S is the total number of random seeds to use
S = 3
# the size of each category, so original data will have 2n rows
n = 100
# number of "folds" to use
K = 5
# sample data
set.seed(2019)
original_data = data.frame(
x = c(rnorm(n, 0.457, 0.01), c(rnorm(n, 0.508, 0.11))),
y = c(rep(0, n), rep(1, n))
)
# will be a data frame to store the results.
results = NULL
iteration = 1
for(s in 1:S){
set.seed(s)
rnd = sample(1:(2*n))
# get randomised data
td = original_data[rnd,]
for(k in 1:K){
# get test and training data
trainset = td[1:140,]
testset = td[-(1:140),]
# fit model and get scores
m = glm(y ~ x, data = trainset, family = "binomial")
# get probabilities and predicted values
model_probabilities = predict(m, newdata=testset,
type="response")
model_predictions = 1 * ( model_probabilities >= 0.5)
# store results
results = rbind(results, data.frame(
seed = s, k = k, iteration = iteration,
probability = model_probabilities,
prediction = model_predictions,
observed = testset$y
))
iteration = iteration + 1
}
}
# table of predicted and observed
t = table(results$prediction, results$observed)
# convert into percentages
t = 100 * round(prop.table(t),3)
# compute the accuracy
accuracy = t[1,1] + t[2,2]
accuracy
With the output of :
> accuracy
[1] 51.1
> dim(results)
[1] 900 6
I want to know how to calculate the standard deviation for this accuracy measure.
edit – choice of $n$
still interested in the answer to this question, not sure if there's additional information required.
Initially I thought that I should just use
$$
\sqrt{
\frac{p(1-p)}{n}
}
$$
Where $n = $ number of rows in test set.
This doesn't seem to take into account that the accuracy score is averaged across many iterations, and I can't find literature for this
Best Answer
Your procedure is overly complicated, just use bootstrap. With bootstrap you would randomly, with replacement, take samples of size $n$, out of your dataset of size $n$. At each iteration you would repeat the whole procedure, including fitting your model, making predictions, and calculating accuracy. You would repeat this many times (hundreds or more) and then simply calculate standard deviation of the estimated accuracies.
If you'd use samples smaller then $n$, the estimate would not reflect the actual variability of the data, it would overestimate the standard deviations (smaller samples vary more). If you use small number of iterations of the algorithm, your estimate of the standard deviation would itself not be precise.