This is a problem I have recently come across and have been trying to solve it mathematically.
- Suppose there is a pond with 100 fish
- Each day, there is a:
- 5% chance that population of the pond increases by 5% of its current population
- 5% chance that the population of the pond decreases by 5% of its current population
- 90% chance that the population of the pond stays the same
In the past, I have come up with methods to estimate the pond population using recursive formulas (e.g. https://math.stackexchange.com/questions/4727991/expected-score-in-a-coin-flipping-game ) as well as simulation based techniques to show how the pond population is expected to change (R Code):
set.seed(123)
n_days <- 1000
pond_population <- rep(0, n_days)
pond_population[1] <- 100
for (i in 2:n_days) {
prob <- runif(1)
if (prob <= 0.05) {
pond_population[i] <- pond_population[i-1] + round(pond_population[i-1] * 0.05)
} else if (prob > 0.05 && prob <= 0.10) {
pond_population[i] <- pond_population[i-1] - round(pond_population[i-1] * 0.05)
} else {
pond_population[i] <- pond_population[i-1]
}
}
plot(pond_population, type = "l", main = "Pond Population Over Time", xlab = "Day", ylab = "Population")
I thought of an extension to this problem: Suppose each day you catch 10 distinct fish from this pond, tag these fish, and then put them back into the pond. Naturally, it is possible that some days you will catch fish that you previously caught in the past – and it is also possible that some of the fish you caught in the past will die. After you have finished fishing on the 1000th day – what percent of the current fish pond population will be known to you?
I am not sure how to create a mathematical formula which can estimate the expected number of fish you have seen on the 1000th day. I also don't know how to create a mathematical formula which can estimate the probability of observing X% of the fish on the 1000th day.
Recently, I was shown (https://stackoverflow.com/a/76701995/13203841) how to use a simulation based approach to estimate the percentage of the fish pond seen on the 1000th day:
library(tidyverse)
set.seed(123)
fish_pond_sim <- function(pop=100, days=1000, fished_per_day=10) {
fish <- tibble(
id = 1:pop,
seen = FALSE,
dead = FALSE
)
results <- tibble(
population = pop,
day = 1,
seen = 0,
dead = 0,
seen_and_alive = 0
)
living <- pop
for (i in 2:days) {
prob <- runif(1)
five_percent <- round(length(living) * 0.05)
if (prob <= 0.05) {
five_pct_sample <- sample(living, five_percent, replace = FALSE)
fish[fish$id %in% five_pct_sample,]$dead <- TRUE
} else if (prob > 0.05 && prob <= 0.10) {
fish <- fish %>% add_row(
id = max(fish$id):(max(fish$id) + five_percent),
seen = FALSE,
dead = FALSE
)
}
fished_sample <- sample(living, fished_per_day, replace = FALSE)
fish[fish$id %in% fished_sample,]$seen <- TRUE
living <- fish[!fish$dead,]$id
results <- results %>% add_row(
population = length(living),
day = i,
seen = sum(fish$seen),
dead = sum(fish$dead),
seen_and_alive = sum(fish$seen & !fish$dead)
)
}
return(results)
}
result <- fish_pond_sim(1000, 1000)
result %>%
slice_tail() %>%
pull(seen_and_alive_percentage) # 84.00424
result %>%
ggplot(aes(x = day)) +
geom_line(aes(y = population, color = "Population")) +
geom_line(aes(y = seen_and_alive, color = "Seen and Alive")) +
theme_bw()
But I was wondering if someone could help me figure out a mathematical formula to estimate these probabilities?
Thanks!
Best Answer
I am procrastinating from my PhD so I'm typing my comment out. I think a Wright Fisher type model might be a better ansatz for the fish population. (This is the most basic model in population genetics, see for example here.)
The model for the population:
We now want to keep track of the seen fish. Denote the number of seen fish at time $k$ by $X(k)\in [0,N]$. Going from time $k$ to $k+1$ changes the $X(k)$ in two ways:
It would not be hard to compute the exact probabilities here, but they are bulky expressions involving lots of binomial coefficients. It is more instructive to think about the large population limit, this means $N$ being large. The approximate distribution of the two steps above are as follows:
This yields the following update law in the large $N$ limit: $$X(k+1) \approx Bin(X(k), 1/(1+p)) + Bin(\ell, 1-X(k)/N)$$
This allows you to derive the asymptotic proportion $x=\lim_{k\to \infty} E[X(k)/N]$ of seen fish in expectation as this should not change under the update law. Hence $$x = x*\frac{1}{1+p}+\frac{\ell}{N}(1-x).$$ Solve this for $x$, $$x= \frac{\ell/N}{p/(1+p) + \ell/N}.$$
Your set up is $N=100$, $\ell =10$ and $p=0.05$ in which $N$ should be large enough for this approximation. In this case you would have $$x = \frac{10}{21} \approx 48\%.$$