You don't need to account for future rolls, because of monotonicity of the games. As the game progresses, the gains don't increase and risks go higher. Also, in all states there exists at least non-decreasing strategy (to stop the game). That is why you can draw a graph of all states in the game and assign a number of immediate gain to each node and be sure that a node with a negative immediate gain cannot lead to node with positive immediate gain. So your strategy is just to end the game when the immediate gain is negative (because you can't lose a little at first and then gain more in the future rolls).
First game
The game has only one variable as a state — the number of points $p$. So the strategy $S$ can depend only on $p$. Next thing, if there is strategy $S_X$ that tells you to keep playing when you have $X$ points $S_X(X)=1$, then the strategy $$\tilde S_X(p) = \begin{cases}1 & p\le X \\ S_X(p) & p>X\end{cases}$$ will give you at least the same number of points in average. If $S_Y(Y)=0$, then the same true for:
$$
\tilde S_Y(p) = \begin{cases}S_Y(p) & p< Y \\ 0 & p\ge Y\end{cases}
$$
That means that the optimal strategy is just a critical number $P$, so $S(p)=1$ if $p\le P$ and $S(p)=0$ if $p>P$. The question is how to find $P$. Let's compare two strategies with $P_1=P$ and $P_2=P+1$. They tell you to do the same thing except for the moment when you have $p=P+1$ points. The first strategy will tell you to stop, the second one tells you to roll dice one more time and then stop (because you will gain at least 1 point, and $p+1=P+2>P+1$).
The expectation of the first strategy is $E_1=P+1$ points. Expectation of the second strategy is $E_2=5(P+1)/6 + (1+2+3+4+5)/6=5P/6+20/6$. So if $P<15$, then $E_1<E_2$. If $P>15$, then $E_1>E_2$. And if $P=15$, then $E_1=E_2$. Thus, there are two optimal pure strategies with $P=15$ and $P=16$ and everything else is worse.
Second game
It's the other story because now the state depends on what numbers $S_k=\{i_1,\ldots i_k\}$ were rolled before. (But we don't need to add the dependence on $p$ explicilty, since $p=\sum S_k = i_1+\ldots+i_k$
First thought: if strategy tells you to stop, when set $S_k$ was rolled, then you should also stop with set $\tilde S_k: \sum\tilde S_k > \sum S_k$. It's because you risk more $\frac k6\tilde p > \frac k6 p$, but gain less $\frac{6-k}6(21-\tilde p) < \frac{6-k}6(21-p)$.
That is why we need to check the “minimum sum” for each $k$ to see at what $k$ you should totally stop. Analysis of $S_3=\{1,2,3\}$ shows that you should always stop at $k=3$. And then you analysis of 2 throws is perfectly valid.
You are both wrong:
You are wrong because the two rolling procedures do not produce the same distributions of probability.
Your DM is wrong, because this has nothing to do with the Monty Hall (three doors) problem.
Tackling these in reverse order: in the Monty Hall problem, the host of the game knows where the goats and prize are. He cannot reveal the prize, so when he reveals a goat, you gain extra information. That is, there is a dependence in between the sequence of events which leads to an outcome (winning or losing the game).
In the dice scenario, the die rolls are independent. There is no difference between rolling three dice and rerolling the lowest, and rolling four dice and throwing away the lowest of the first three. You don't get any extra information after rolling three dice, as the fourth roll is independent of the previous three rolls.
Therefore your DM's explanation is incorrect.
Your error is, perhaps, a bit more subtle. Since the question asks whether the probability distributions are different, and not for the probability distributions themselves, let's consider a much simpler case:
Procedure 1: Flip a coin (a d2, if you will) twice—say tails is $0$ and heads is $1$. Reflip the lowest result. Equivalently, flip three coins, then throw away the lowest result among the first two tosses.
Procedure 2: Flip a coin three times, and throw away the lowest result.
In either case, you flip the coin $3$ times, leading to $2^3=8$ different events, each corresponding to a sequence of flips. For example, possible events are
$$ HHT, \qquad TTT, \qquad\text{and}\qquad HTH. $$
With respect to either procedure, these $8$ events correspond to just three outcomes: you get a sum of $0$, $1$, or $2$ (the total number of heads which "count"). However, the probabilities of these outcomes are not the same. I have chosen this example to be small and tractable, so we can actually show every possible event and outcome, as summarized below:
\begin{array}
.\text{Event} & \text{Proc 1} & \text{Proc 2} \\\hline
TTT & 0 & 0 \\
TTH & 1 & 1 \\
THT & 1 & 1 \\
THH & 2 & 2 \\
HTT & 1 & 1 \\
HTH & 2 & 2 \\
\color{red}{HHT} & \color{red}{1} & \color{red}{2} \\
HHH & 2 & 2
\end{array}
Notice that these two procedures give very similar results, but that Procedure 2 is, on average, slightly better. The difference occurs in the line which I have colored red—under Procedure 1, you get two heads, but then replace one of those heads with tails on your next toss. Under Procedure 2, you get to keep both heads, so you are better off.
The question regarding dice follows a similar pattern. Rolling three dice, rerolling the lowest, then rerolling the lowest again (or, equivalently, rolling five dice, then dropping the lowest two from the first four) is akin to Procedure 1. Rolling five dice and dropping the lowest two is akin to Procedure 2. Procedure 2 is always going to win out—heuristically, this is because you never replace a high roll with a lower roll.
Best Answer
Your analysis only takes into account the possible increase in your score that you get from one more throw (call this the 'immediate increase') rather than from potentially multiple more throws (the 'long-term' increase). As such, your analysis could give the wrong advice and tell you to stop if the expected immediate increase would be less than the expected loss of losing everything, even though your expected long-term increase could actually outweigh that loss.
So, instead of considering what you should do after rolling $1$ die, and then $2$, etc., it is better to first find the point where there is no long-term increase anymore, and then work your way back.
Fortunately, this end point is easy to establish, in that you should never go beyond three rolls: even if you only scored the mimimum sum so far, which would be $1+2+3=6$, you have a probability of $\frac{1}{2}$ of losing everything, and so to beat that, you'd need at least two more throws without losing everything, which has a chance of far less than $\frac{1}{2}$
Ok, but should you ever go beyond two rolls? Given that we now have established that you should never go beyond three rolls, any increase at this point will only be from any immediate increase.
As such, your formula for having rolled two dice actually turns out to be correct: if after two rolls you have a score of $x$, then the expected gain is $\frac{2}{3}\cdot\frac{21-x}{4} - \frac{1}{3}\cdot x$, which has a breaking point exactly at $x=7$ ... and what that means is that if after two rolls you have a score below $7$ you should roll again, if the score is above $7$, you should stop, and if your score is exactly $7$, it doesn't matter what you do.
Finally, then, we can move to the situation where you have rolled just $1$ die. Now, here there is actually some potential long-term gain to be had from rolling again, which your formula does not take into account, but since the expected immediate gain already outweighs the expected loss for all possible scores, I agree with you that you should indeed always roll again.