"Could you explain why Brian divides $20!$ in a more intuitive way"
The way I teach this is what is sometimes called "The Shepherd's Principle" or also called "Division by symmetry." Imagine if you will that you are a shepherd in the field with your flock of sheep. These sheep are all healthy and incredibly wooly, so much so that their heads are obscured and their bodies all appear as one large mass. You wish to count how many sheep are present. Although it is difficult to count how many bodies there are and can not see how many heads there are, the wool only covers the main part of their bodies. You can still clearly count how many legs are present. If you were to kneel down and count how many legs you can see, you can then divide the final result by $4$ to get the total number of sheep.
In much the same way, in combinatorics when trying to count objects or arrangements and such... if you have a method to count but it overcounts, so long as you were consistent with how much you have overcounted each scenario you can correct the count by dividing the result by the number of times you counted each so as to have the overall effect be that after dividing it is as though you counted each object only once apiece.
Here, in Brian's answer, if we were to have gone with an initial count of $100!$ having sent the first five people to the first table in that order, the second five people to the second table in that order and so on, this will have overcounted. First, it is as though we overcounted for each possible labeling of the tables. Dividing by $20!$ fixes this. That alone was not enough of a fix however. Next, it is as though we overcounted for each possible rotation of the first table. Dividing by $5$ fixes this. Similarly, for the second table and third table and so on... for a total of dividing by $5^{20}$.
Here is another answer which avoids this "division by symmetry."
Take our $100$ people. Let us assume without loss of generality that they all have different names and let us order them alphabetically. This can be done in exactly one way.
Now... take the first person alphabetically and let them take a seat at one of the tables in any of the chairs. It does not matter to us which table and it does not matter to us which chair. Once they have picked their seat, then pick someone from the remaining $99$ people to sit to that person's right. Then pick someone from the remaining $98$ people to sit to that person's right, and again, and again. That fills our first table.
Next, there are $95$ remaining people and one of those people comes first alphabetically from all who remain. Let them pick a seat arbitrarily from a table that remains. Once that happens, pick the person to sit to their right, and then again and again, etc...
Continue in this fashion until all seats are filled. This process gives us an answer of:
$$99\times 98\times 97\times 96\times 94\times 93\times 92\times 91\times 89\times \cdots \times 2\times 1$$
This expression is a bit difficult to work with, so let us make it cleaner by simultaneously multiplying and dividing by those missing terms
$$\dfrac{100!}{100\times 95\times 90\times 85\times \cdots \times 5}$$
This denominator, let us simplify that as well by factoring out a common factor of $5$ from each
$$\dfrac{100!}{5^{20}20!}$$
This is exactly the same answer as Brian's, just arrived to in a different manner having avoided the "division by symmetry" style arguments.
Best Answer
There is a big error in the last steps of the shorter proof, and a mild gap.
First, the motivation.
Intuitively, this is a question about the so-called taxicab distance. The taxicab distance between $(a,b)$ and $(x,y)$ is $$|x-a|+|y-b|.$$
$|a|+|b|,$ $|a-1|+|b-1|$ and $|a+1|+|b+1|$ are the distances between $(a,b)$ and $(1,1),$ $(0,0),$ and $(-1,-1),$ respectively. And $|a-b|$ is the shortest distance from $(a,b)$ to the line $y=x.$
So everything about this seems to be about the taxicab distances between $(a,b)$ and points $(x,x).$
So it is natural to define the distance from $(a,b)$ to $(x,x)$:
$$f(x)=|a-x|+|b-x|$$
and try to learn about this function.
The big error in the given proof is the last two lines. The statement $b\leq-1\leq a\leq 1$ is wrong.
It should be $$b\leq-1<1\leq a,\tag1$$ and thus $a-b=|a-b|\geq 2.$
The reason for $(1)$ is the skipped step.
$f(x)$ is strictly decreasing when $x<b$ and strictly increasing when $x>a,$ and so the only way three distinct values $u,v,w$ have $f(u)=f(v)=f(w)$ is for the three values to be in $[b,a].$
But we know $f(0)=f(-1)=f(1),$ so $0,1,-1\in[b,a].$
The "shape" of $f$ is the key. We want to show that $f^{-1}(d)$ has at most two points for any $d>|a-b|,$ and for that you need to know not just that $f$ is constant on $[b,a],$ but that the functions is strictly decreasing on $(-\infty,b)$ and strictly increasing on $(a,+\infty).$
So we get a stronger result:
Picking $a>b$ is possible because the question is symmetric. You could theoretically prove it in two cases, $a>b$ and $b>a,$ but the second case would be exactly the same, with $a,b$ reversed. So we often do only the first case if it is obvious the second case is exactly the same.
Mathematicians will often say things like "Without loss of generality, we can assume $a>b.$" This usually means that we are proving for some subset of cases, but we can easily get the general case back from this subset.