Given a random variable $X$, a location scale transformation of $X$ is a new random variable $Y=aX+b$ where $a$ and $b$ are constants with $a>0$.
The location scale transformation $aX+b$ horizontally scales the distribution of $X$ by the factor $a$, and then shifts the distribution so obtained by the factor $b$ on the real line $\mathbb{R}$.
- In an intuitive sense, the expected value $\mathbb{E}[X]$ of a random variable is the center of mass of the distribution of $X$. Shifting the distribution of $X$ by a factor $b$, shifts the center of mass by the factor $b$. Scaling the distribution of $X$ by a factor $a$, scales the center of mass by $a$. In other words, $$\mathbb{E}[aX+b]=a\mathbb{E}[X]+b$$
- Similarly, the variance of $X$ is a measure of the horizontal spread of the distribution of $X$, but the $\text{Var}[X]$ is defined as squared-distance. Thus scaling the distribution of $X$ by a factor $a$, scales the $\text{Var}[X]$ by the factor $a^2$. Shifting the distribution of $X$ by any factor will not affect the spread of distribution, $\text{Var}[X]$, but only affects center of mass. In other words, $$\text{Var}[aX+b]=\text{Var}[aX]=a^2\text{Var}[X]$$
Now, here is an hint to your problem: $Y=\dfrac{X-\mu}{\sigma}=\dfrac{1}{\sigma}X-\dfrac{\mu}{\sigma}$, which can be written as $aX+b$. Find $a$ and $b$, and then use the location-scale transformation.
Think of the difference like any other statistic that you are collecting. These differences are just some values that you have recorded. You calculate their mean and standard deviation to understand how they are spread (for example, in relation to 0) in a unit-independent fashion.
The usefulness of the SD is in its popularity -- if you tell me your mean and SD, I have a better understanding of the data than if you tell me the results of a TOST that I would have to look up first.
Also, I'm not sure how the difference and its SD relate to a correlation coefficient (I assume that you refer to the correlation between two variables for which you also calculate the pairwise differences). These are two very different things. You can have no correlation but a significant MD, or vice versa, or both, or none.
By the way, do you mean the standard deviation of the mean difference or standard deviation of the difference?
Update
OK, so what is the difference between SD of the difference and SD of the mean?
The former tells you something about how the measurements are spread; it is an estimator of the SD in the population. That is, when you do a single measurement in A and in B, how much will the difference A-B vary around its mean?
The latter tells us something about how well you were able to estimate the mean difference between the machines. This is why "standard difference of the mean" is sometimes referred to as "standard error of the mean". It depends on how many measurements you have performed: Since you divide by $\sqrt{n}$, the more measurements you have, the smaller the value of the SD of the mean difference will be.
SD of the difference will answer the question "how much does the discrepancy between A and B vary (in reality) between measurements"?
SD of the mean difference will answer the question "how confident are you about the mean difference you have measured"? (Then again, I think confidence intervals would be more appropriate.)
So depending on the context of your work, the latter might be more relevant for the reader. "Oh" - so the reviewer thinks - "they found that the difference between A and B is x. Are they sure about that? What is the SD of the mean difference?"
There is also a second reason to include this value. You see, if reporting a certain statistic in a certain field is common, it is a dumb thing to not report it, because not reporting it raises questions in the reviewer's mind whether you are not hiding something. But you are free to comment on the usefulness of this value.
Best Answer
I'll go with the cliche example - coin flipping. Note that I'm abandoning rigor and some important assumptions in this example, but that's just fine.
Let's say I have a regular coin - that is, once I flip it, it has a 50% chance of landing heads and a 50% chance of landing tails.
So if I flip it 10 times, I'd expect 5 tails and 5 heads. But I could very well get 6 heads and 4 tails. Or 7 heads and 3 tails.
But wait a second - why would I expect 5 tails and 5 heads? Maybe it's obvious - because each flip has a 50% chance of landing heads - so $50\% \times 10 = 5$. In other words the expected value of my coin flipping exercise is 5 heads (and therefore 5 tails) .
Let's make the example more interesting now. Let's flip the coin 100 times. But check it - nothing changes in terms of my expected value. I still expect half of the tosses to be heads - i.e. 50 heads.
But in reality I might not get 50 heads. Let's say I got 45 heads. Is that far from my expected value of 50? Should I be surprised by that result? Would you be? Probably not. If I told you that I got only 20 heads, then you might think something's up. Why do you think that is?
That's sort of the intuitive notion of variance. How likely is it for our results to deviate from the expected value? Some things (like coin flips) have a pretty good chance of deviating from their expected value. Other things don't.
We can put a number on this. In some instances, for mathematical convenience and interpretability, we can take the square root of this number. That's the standard deviation.
The definitions you refer to above are more technically accurate and have direct mathematical formulations, hence terms like probability weighted and random variable.
But if you understand the coin flipping example, then you'll understand the spirit of the terms.