The most usual way to show that a topology $A$ is stronger than a topology $B$ is to show that a convergent net in $A$ is also convergent in $B$. You show that the two topologies are equal by showing that both are stronger than the other.
Since all the topologies you want to consider are linear, it is enough to consider convergence at $0$.
Here, suppose that $T_j\to 0$ in norm, i.e. $\|T_j\|\to0$. For any $x\in H$,
$$
\|Tx\|+\|T^*x\|\leq\|T\|\,\|x\|+\|T^*\|\,\|x\|=2\|T\|\,\|x\|\to0;
$$
so norm convergence implies strong* convergence.
From $\|Tx\|\leq\|Tx\|+\|T^*x\|$ we get that strong* convergence implies strong convergence.
If $T_j\to0$ strongly, this means that $T_jx\to0$ for all $x\in H$. Then
$$
|\langle T_jx,y\rangle|\leq\|T_jx\|\,\|y\|\to0,
$$
so strong convergence implies weak convergence.
So far we have shown the "weaker than" implications. Now we need to see that they are strict.
Fix an orthonormal basis $\{e_j\}$ of $H$ and let $E_j$ be the projection onto the span of $e_j$ (i.e. $E_jx=\langle x,e_j\rangle e_j$). Then for any $x=\sum_j\alpha_je_j$,
$$
\|E_jx\|+\|E_j^*x\|=2\|E_jx\|=|\alpha_j|\to0,
$$
so $E_j\to0$ in the strong* topology. But $\|E_j\|=1$ for all $j$, so $\{E_j\}$ does not converge in norm.
Now consider the left-shift operator $T$ given by $Te_1=0$, $Te_j=e_{j-1}$ for $j>1$ (we don't need to assume $H$ separable here, just use a well-ordering of the index set). If you are familiar with it, $T$ is the adjoint of the unilateral shift. Put $T_n=T^n$, $n\in\mathbb N$. Then
$$
\|T_nx\|^2=\sum_{k>n}|\alpha_k|^2\to0\ \ \text{ as }n\to\infty;
$$
so $T_n\to0$ in the strong topology. But $T^*$ is an isometry, so
$$
\|T_nx\|+\|T_n^*x\|\geq\|T_n^*x\|=\|x\|
$$
for all $n$, so $\{T_n\}$ does not converge in the strong* topology.
Finally, we need a net that converges weakly but not strongly. Here we can use the unilaterial shift $S$ ($T^*$ above). If $x=\sum\alpha_ke_k$, $y=\sum_j\beta_je_j$, then
$$
|\langle S^nx,y\rangle|=|\sum_k\alpha_k\beta_{k+n}|\leq\sum_k|\alpha_k|\,|\beta_{k+n}|\leq\left(\sum_k|\alpha_k|^2\right)^{1/2}\,\left(\sum_k|\beta_{k+n}|^2\right)^{1/2}\to0
$$
(the second sum goes to zero because of the $n$). So $S^n\to0$ weakly but not strongly (recall that $S$ is an isometry).
I'm going to consider a part of your Question 1, namely:
What is the reason most people don't bother talking about the actual topology and seems satisfied with sequences, although the topology is not sequential?
I think that there are (at least) two reasons for this. The first is technical:
- The topology is not easy to define and it is not easy to manipulate (here "not easy" means "not easy for an introductory course", for example a course with focus in the applications to PDE).
The second is more relevant:
- The topology doesn't matter for the basic properties of distributions (probably, for the topics of the said introductory courses in which the said topology is not defined).
Sounds unsatisfactory, right? I agree, so let me explain. These are words (not literally) of Laurent Schwartz, who created the theory of distributions. In fact, Schwartz said the following with respect to the time in which he started work with the test functions:
I was unable to put a topology on $\mathcal{D}$, but only what I called a pseudo-topology, i.e. a sequence $(\phi_n)$ converges to $0$ in $\mathcal{D}$ if the $(\phi_n)$ and all their derivatives converge uniformly to $0$, keeping all their supports in a fixed compact set. I only found an adequate topology much later, in Nancy in 1946. But it doesn't matter for the main properties. ([1], p. 229-230).
This quote teach us the following:
- Historically, the notion of convergence of sequences in $\mathcal{D}$ came before the topology of $\mathcal{D}$.
As a consequence, it is natural to begin the study of distribution theory with the notion of convergence (instead of start with the actual topology).
In addition, the quote draw our attention for the following fact:
- There are problems that you can solve in the context of distributions without invoke a topology for $\mathcal{D}$. For some purposes, the usual notion of convergence (which Schwartz called pseudo-topology) is enough.
For example, the fact that the distributional derivative "preserves convergence of sequences" (in $\mathcal{D}'$) is a result that can be obtained and applied to the differential equations without the actual topology of $\mathcal{D}$.
Remark: Sometimes this result is called "continuity" of the distributional derivative, even in the context where the notion of convergence in $\mathcal{D}'$ is defined as the convergence in $\mathcal{D}$: the explicit form of the convergence is given but a topology is not defined. However, it is indeed possible to put a topology on $\mathcal{D}'$ (which implies the said notion of convergence in $\mathcal{D}'$) without put a topology on $\mathcal{D}$. With respect to this topology in $\mathcal{D}'$ the distributional derivative is indeed "continuous" (and thus preserves convergence of sequences). To give a reference for this remark, let me quote what Schwartz said in his treatise:
Nous définissons ainsi sur $\mathcal{D}'$ une topologie (qui, remarquons-le encore, ne nécessite pas la connaissance de la topologie de $\mathcal{D}$, mais seulement de ses ensembles bornés). ([2], p. 71)
Of course, as the quote suggests, Schwartz could define boundedness in $\mathcal{D}$ even in absence of a topology:
I did not have a topology on $\mathcal{D}$, but what I called a pseudo-topology [...]. I could speak without difficulty of a bounded subset of $\mathcal{D}$ [...]. $\mathcal{D}$ was more or less one of the spaces I had studied deeply during that short period [summer of 1943], always with the slight difficulty of the pseudo-topology, which nevertheless did not stop me. ([1], p. 231)
In short, all these things support the fact that is it possible to do (and Schwartz certainly did) many things in the context of the distributions without appeal to the topology of $\mathcal{D}$ (but only with the notion of convergence). In my opinion this justifies the second reason above as a fundamental answer for your "why". Maybe we could just say that people avoid talking about the topology (in some contexts) because it is an efficient strategy (in the context where it is avoided). The point is that the topology was created to yields a prior notion of convergence and allow a deeper development of the theory. The notion of convergence is not a mere simplification to avoid a complicated topology whose origin is a mystery; of course the topology is complicated and people make it seems mysterious (by virtue of an explanation's lack), but the notion of convergence is the cause of the topology and not the converse. Maybe you will agree that, from this point of view, the fact that in some contexts "people don't bother talking about the actual topology" becomes natural and acceptable.
Addendum (details on the creation of the topology). What was the advantage of defining a topology on $\mathcal{D}$? It was to make possible the application of the knows theorems of topological spaces, like the Hanh-Banach Theorem. The last sentence seems vague and sounds like a cliche, right? But it is the truth; it was essentially what Schwartz said:
In Grenoble, I gave an exact definition of the real topology corresponding to the pseudo-topology on $\mathcal{D}$, which later, in 1946, Dieudonne and I took to calling an inductive limit topology. The pseudo-topology is not enough; in order to apply the Hahn-Banach theorem and to study the subspaces of $\mathcal{D}$, you need to work with a real topology. ([1], p. 238)
I carefully defined the neighborhoods of the origin in $\mathcal{D}$, then gave the characteristic property which was precisely that of being an inductive limit, without giving it a name. I only did this for the particular object $\mathcal{D}$, without daring to introduce a general category of objects. Mathematical discovery often takes place in this way. One hesitates to introduce a new class of objects because one needs only one particular one, and one hesitates even more before naming it. It's only later, when the same procedure has to be repeated, that one introduces a class and a name, and then mathematics takes a step forwards. Other inductive limits were
introduced, then the theory of sheaves used them massively and homological algebra showed the symmetry of inductive and projective limits. ([1], p. 283)
[1] A Mathematician Grappling with His Century by Laurent Schwartz.
[2] Théorie des distributions by Laurent Schwartz.
Best Answer
Certainly there exist stronger topologies on distributions, but as a practical matter the weak-* definition is the one that is interesting and I assume that was the direction of your question. There isn't the usual norm topology available on $\mathcal D(U)$, and per Tim's comment do not have a different norm topology either.
$\mathcal D(U)$ is a pretty strict space to be in and to converge in, so it isn't very demanding to be a distribution. The hard work is all put on the test functions, so to speak. Although there is a certain amount of interesting things you can do with distributions, practically distributions are a stepping stone for getting to more interesting spaces, such as using their differentiability properties to define Sobolev spaces.