Estimating Catalan numbers using Monte Carlo method

catalan-numberscombinatoricscomputer sciencemonte carlorandom walk

This question regards the classical problem of estimating Catalan numbers by performing a random walk on a grid of $n\times n$ squares. I will dectribe the problem for those who are not familiar with it, but you may also skip this section to the Monte Carlo code which is the reason I started this thread.


We start at the lower left corner of the grid with initial step to the right (marked by a right arrow). We may only take steps rightwards or upwards. We will only consider paths that end in the upper right corner and that stay below the main diagnoal, see figure below for some examples when the grid is $4\times 4$, where the dashed line is the main diagonal.

enter image description here

The problem we would like to solve is to find out what the size of the set that contains all such paths (i.e. initial right steps in lower left corner, travel by only rightwards and upwards steps, do not cross the main diagonal and ends in upper right corner) is. Let us denote this set by $S_{n}$ and let us also denote the set that allows paths to cross the main diagonal by $T_{n}$. Then it clearly holds that $S_{n}\subset T_{n}$. As prepatory work in solving this problem, let us compute the size of $T_{n}$, i.e. $\vert T_{n}\vert$. If we think about it in the right manner it will not be so hard to compute: we know that a random walk in $T_{n}$ must consist of $n$ right-steps and $n$ up-steps, thus $2n$ steps in total, one of which (the initial step) is fixed to a right step. Thus, if we imagine the $2n-1$ remaining steps as blanks to be filled in in different ways by either $n$ up-steps or $n-1$ right-steps we realize that this number of ways is given by the binomial coefficient

$$
\begin{align}
\vert T_{n}\vert = {2n-1\choose n} = {2n-1\choose n-1}.
\end{align}
$$

Finally, the Monte Carlo estimator to the Catalan number $c_{n}=\vert S_{n}\vert$ can be formulated: let $X^{i}$ for $i=1,\dots,N$ by i.i.d. be $p(x)=\text{Unif}(T_{n})$, i.e. random walks in $T_{n}$ with uniform probabilites. Then

$$
\begin{align}
&\vert S_{n}\vert = \sum_{x\in T_{n}}\mathbf{1}_{S_{n}}(x) = \sum_{x\in T_{n}}\frac{1}{p(x)}\mathbf{1}_{S_{n}}(x)p(x) = \mathbf{E}(\mathbf{1}_{S_{n}}(x)/p(x)) \implies \\[3mm]
&c_{n}^{N} = \frac{1}{N}\sum_{i=1}^{N}\frac{1}{p(x)}\mathbf{1}_{S_{n}}(X^{i}) = \frac{\vert T_{n}\vert}{N}\sum_{i=1}^{N}\mathbf{1}_{S_{n}}(X^{i}).
\end{align}
$$


Now, what I will do is simulate $N$ random walks in $T_{n}$ and then associate every one of those walks that also are in $S_{n}$ with the number $1$ (the other walks will be associated with $0$) and in the end aggregate all the $1$:s to get the sum in the above expression. Here is my code:

N = 10000;
n = 4;
S = 0;      % Number of paths that belong to Sn
for i = 1:N
    X = path(n);
    % If the i:th element of the cumulative sum of the path X is a number
    % that is < 0, then the path belongs to Sn
    P = cumsum(X);
    if ~any(P < 0)
        S = S + 1;
    end
end

cn = 1/(n+1)*nchoosek(2*n,n);
Tn = nchoosek(2*n-1,n);

tau = 1/N*S*Tn;
display(tau)

disc = abs(cn-tau);

function output = path(n)
steps = 2*n;
dir = zeros(1,steps);

% -1 rep. upwards and 1 rep. rightwards
dir(1) = 1;     % Initial step
tot0 = 1;       % Total (current) number of 1:s in path
tot1 = 0;       % Total (current) number of negative 1:s in path
for i = 2:steps
    r = rand;
    if r < 0.5
        if tot0 < n
            dir(i) = 1;     % Step rightwards
            tot0 = tot0 + 1;
        else
            dir(i) = -1;     % Step upwards
            tot1 = tot1 + 1;
        end
    else
        if tot1 < n        
            dir(i) = -1;     % Step upwards
            tot1 = tot1 + 1;
        else
            dir(i) = 1;     % Step rightwards
            tot0 = tot0 + 1;
        end
    end
end
output = dir;
end

I am wondering why this code does not give a arbitrarily good estimate when I increase $N$, it seems like me estimation is always a factor $2$ of the Catalan number it tries to estimate. What I am doing wrong in my code? Or have I derived the Monte Carlo scheme in the worng way?

Best Answer

You have to generate a random anagram of the word UUUURRR (U=Up, R=Right), that is just a permutation of the letters. To do so, you can apply any of the methods described, for example, here.