If You Go Fishing Everyday – What Is The Probability You Know X% Of The Pond

probabilityrecurrence-relations

This is a problem I have recently come across and have been trying to solve it mathematically.

  • Suppose there is a pond with 100 fish
  • Each day, there is a:
    • 5% chance that population of the pond increases by 5% of its current population
    • 5% chance that the population of the pond decreases by 5% of its current population
    • 90% chance that the population of the pond stays the same

In the past, I have come up with methods to estimate the pond population using recursive formulas (e.g. https://math.stackexchange.com/questions/4727991/expected-score-in-a-coin-flipping-game ) as well as simulation based techniques to show how the pond population is expected to change (R Code):

set.seed(123)
n_days <- 1000
pond_population <- rep(0, n_days)
pond_population[1] <- 100


for (i in 2:n_days) {
    prob <- runif(1)
    if (prob <= 0.05) {
        pond_population[i] <- pond_population[i-1] + round(pond_population[i-1] * 0.05)
    } else if (prob > 0.05 && prob <= 0.10) {
        pond_population[i] <- pond_population[i-1] - round(pond_population[i-1] * 0.05)
    } else {
        pond_population[i] <- pond_population[i-1]
    }
}
 


plot(pond_population, type = "l", main = "Pond Population Over Time", xlab = "Day", ylab = "Population")

enter image description here

I thought of an extension to this problem: Suppose each day you catch 10 distinct fish from this pond, tag these fish, and then put them back into the pond. Naturally, it is possible that some days you will catch fish that you previously caught in the past – and it is also possible that some of the fish you caught in the past will die. After you have finished fishing on the 1000th day – what percent of the current fish pond population will be known to you?

I am not sure how to create a mathematical formula which can estimate the expected number of fish you have seen on the 1000th day. I also don't know how to create a mathematical formula which can estimate the probability of observing X% of the fish on the 1000th day.

Recently, I was shown (https://stackoverflow.com/a/76701995/13203841) how to use a simulation based approach to estimate the percentage of the fish pond seen on the 1000th day:

library(tidyverse)

set.seed(123)

fish_pond_sim <- function(pop=100, days=1000, fished_per_day=10) {
    fish <- tibble(
            id = 1:pop,
            seen = FALSE,
            dead = FALSE
            )

    results <- tibble(
        population = pop,
        day = 1,
        seen = 0,
        dead = 0,
        seen_and_alive = 0
    )
    living <- pop
  for (i in 2:days) {
    prob <- runif(1)
    five_percent <- round(length(living) * 0.05)
    if (prob <= 0.05) {
      five_pct_sample <- sample(living, five_percent, replace = FALSE)
      fish[fish$id %in% five_pct_sample,]$dead <- TRUE
    } else if (prob > 0.05 && prob <= 0.10) {
      fish <- fish %>% add_row(
        id = max(fish$id):(max(fish$id) + five_percent), 
        seen = FALSE,
        dead = FALSE
      )
    }

    fished_sample <- sample(living, fished_per_day, replace = FALSE)
    fish[fish$id %in% fished_sample,]$seen <- TRUE
    living <- fish[!fish$dead,]$id
    results <- results %>% add_row(
      population = length(living),
      day = i,
      seen = sum(fish$seen),
      dead = sum(fish$dead),
      seen_and_alive = sum(fish$seen & !fish$dead)
    )
  }
    return(results)
}

result <- fish_pond_sim(1000, 1000) 

result %>%
    slice_tail() %>%
    pull(seen_and_alive_percentage) # 84.00424

result %>% 
    ggplot(aes(x = day)) +
    geom_line(aes(y = population, color = "Population")) +
    geom_line(aes(y = seen_and_alive, color = "Seen and Alive")) + 
    theme_bw()

enter image description here

But I was wondering if someone could help me figure out a mathematical formula to estimate these probabilities?

Thanks!

Best Answer

I am procrastinating from my PhD so I'm typing my comment out. I think a Wright Fisher type model might be a better ansatz for the fish population. (This is the most basic model in population genetics, see for example here.)

The model for the population:

  • There are always $N$ fish.
  • Reproduction: In each time step each fish reproduce into two with probability $p>0$. This produces $M\geq N$ fish.
  • Selection: From these $M$ fish we pick $N$ uniformly at random. (The idea here is that the resources of the environment can only support a population of size $N$).

We now want to keep track of the seen fish. Denote the number of seen fish at time $k$ by $X(k)\in [0,N]$. Going from time $k$ to $k+1$ changes the $X(k)$ in two ways:

  1. Only a random subset of the seen fish might survive the selection step.
  2. We do another round of fishing, say we inspect $\ell$ fish, some of which might not yet have been seen.

It would not be hard to compute the exact probabilities here, but they are bulky expressions involving lots of binomial coefficients. It is more instructive to think about the large population limit, this means $N$ being large. The approximate distribution of the two steps above are as follows:

  1. For $N$ large, by the law of large numbers, we have $M\approx (1+p)N$. This means that we can approximate the selection step by killing each fish independently with probability $N/M \approx 1/(1+p)$. This means of the $X(k)$ seen fish, a binomial number survives, with parameters $X(k)$ and $1/(1+p)$.
  2. Assuming $N \gg \ell$, instead of fishing out $\ell$ fish at once, we can fish $\ell$ times while putting the fish back. Disregarding the selective step, there are $N-X(k)$ fish that have not been seen, so looking at a uniform fish means that we see an unseen fish with probability $(N-X(k))/N = 1-X(k)/N$. We do this $\ell$ times, this yields another binomial, this time with parameters $\ell$ and $1-X(k)/N$.

This yields the following update law in the large $N$ limit: $$X(k+1) \approx Bin(X(k), 1/(1+p)) + Bin(\ell, 1-X(k)/N)$$

This allows you to derive the asymptotic proportion $x=\lim_{k\to \infty} E[X(k)/N]$ of seen fish in expectation as this should not change under the update law. Hence $$x = x*\frac{1}{1+p}+\frac{\ell}{N}(1-x).$$ Solve this for $x$, $$x= \frac{\ell/N}{p/(1+p) + \ell/N}.$$

Your set up is $N=100$, $\ell =10$ and $p=0.05$ in which $N$ should be large enough for this approximation. In this case you would have $$x = \frac{10}{21} \approx 48\%.$$

Related Question