Ok, I think I worked this out. The key insight, oddly enough, was that $\operatorname{Diff}(M)$ is first countable. That helps immensely, because now instead of using the "preimages of open sets are open" definition of continuity, I can use convergent sequences. This simplified things enough for me to understand what was going on.
[To see it is first countable, it's pretty easy to cover $M$ with a countable basis of precompact charts $\{U_i\}$, and then to check that the open sets
$$ N_{1/m}^r(f; \overline{U_i},U_j,U_k) $$
Form a countable subbasis around $f$, where all the subscripts and superscript are positive integers.]
Now if $f_n\xrightarrow{\operatorname{Diff}}f$, then in terms of the topology described in the question, this means that for any tuple $(\epsilon, r, K, U, V)$, there is a corresponding $N$ such that for $n\ge N$,
- $f_n(K)\subset V$
- $\lVert f_n^{(i)}-f^{(i)}\rVert_K<\epsilon$ for $0\le i\le r$
Since the compact-open topology is coarser than this topology, convergence in the latter implies convergence in the former. So we can get rid of (1) and instead say that $f_n\xrightarrow{\operatorname{Diff}}f$ is equivalent to
- $f_n\xrightarrow{M}f$
- $f_n^{(i)}\xrightarrow{K}f^{(i)}$ for all $i$ and all valid $K$
[I'm writing $a\xrightarrow{L} b$ to mean uniform convergence on the compact set $L$. I'm writing $a\xrightarrow{\operatorname{Diff}}b$ to mean convergence in the topology on $\operatorname{Diff}(M)$.]
Finally, since we already know inversion and composition are continuous in the compact-open topology, we can focus on (4).
Now for inversion, suppose $f_n\xrightarrow{\operatorname{Diff}}f$. Then using the fact that $f^{-1}\circ f(x)=x$ and applying the chain rule repeatedly, we see that we can write
$$ f^{-(r)}\circ f(x)=c_r(f^{(1)}(x), f^{(2)}(x), \ldots, f^{(r)}(x))$$
where $c_r:\mathbb{R}^+\times\mathbb{R}^{r-1}\rightarrow\mathbb{R}$ is continuous. By choosing $n$ large enough, we can restrict the domain of $c_r$ [essentially by (1) and/or (3) above], and we can then assume $c_r$ is uniformly continuous. Then (4) plus the above equation implies
$$ f_n^{-(r)}\circ f_n\xrightarrow{K}f^{-(r)}\circ f$$
Finally, (3) implies we also have
$$ f_n^{-(r)}\circ f_n\xrightarrow{K}f_n^{-(r)}\circ f$$
and the fact that $f$ is a diffeomorphism shows we reallly have
$$ f_n^{-(r)}\xrightarrow{K}f^{-(r)}$$
which is enough for (4), and thus inversion is continuous.
For composition, again the chain rule applied to $g\circ f$ gives an equation like
$$ (g\circ f)^{(r)}(x) = d_r(g^{(1)}\circ f(x), \ldots, g^{(r)}\circ f, f^{(1)}(x), \ldots, f^{(r)}(x))$$
where we can assume $d_r$ is uniformly continuous.
Then (4) applied to $g$ shows we have
$$ (g_n\circ f)^{(r)}\xrightarrow{K}(g\circ f)^{(r)} $$
and (4) applied to $f$ then gives
$$ (g_n\circ f_n)^{(r)}\xrightarrow{K}(g\circ f)^{(r)} $$
The real insight here is that using the "convergent sequence" definition of continuity really simplifies the notation and presentation, and then really it all falls on the chain rule.
As Daniel Fischer and Conifold pointed out, the proof fails for unbounded subsets of $GL(\mathcal{H})$, since one needs to bound $\Vert U_i \Vert$.
I thought one could deal with that via uniform boundedness as follows:
Since $U_i x \rightarrow Ux$ we have pointwise convergence and thus pointwise boundedness i.e. $\forall x \in X: \sup_{i \in I} \Vert U_i x \Vert < \infty$. And thus by uniform boundedness: $\sup_{i \in I} \Vert U_i \Vert < \infty$.
However, as opposed to a convergent sequence, a convergent net in a normed space need not be norm bounded. A simple example is given in the answer here.
Remark: note that for separable $\mathcal{H}$, bounded subsets of $GL(\mathcal{H})$ are metrizable. In this case, the argument works, since in order to show continuity it would have been enough to consider sequences.
Best Answer
Let $g_n(x)= x + (1/n,0)$. This converges pointwise to the identity map.
Let $f_n$ be a homeomorphism that maps the point $(1/n,0)$ to $(1/n,1)$ and is the identity outside of $U_n:=B_{2^{-n}}(\{1/n\}\times[0,1])$. These kind of things can be built by multiplying the vector field $(0,1)$ with an appropriate function that is supported in $U_n$ and equal to $1$ on the interval $\{1/n\}\times[0,1]$ and then taking the flow.
Then for any $x\in\Bbb R^2$ there is an $N$ so that $x\notin U_n$ for $n>N$, and then $f_n$ converges pointwise to the identity map.
But $f_n(g_n((0,0)))=(1/n,1)$ converges to $(0,1)$ and not $(0,0)$.
Basically whats happening is that $f_n$ is always pushing some point close to the origin far away, but as far as pointwise convergence can tell the set on which this happens is being "shuffled out of existence". The function $g_n$ is chosen so that the origin is mapped to this nearby point.