- Whether you should impute both the pre- and post- scores, or the difference score, depends on how you analyze the pre-post difference. You should be aware there are legitimate limitations to analyses of difference scores (see Edwards, 1994, for a nice review), and a regression approach in which you analyze the residual for post- scores after controlling for pre-scores might be better. In that case, you would want to impute pre- and post- scores, since those are the variables that will be in your analytic model. However, if you're intent on analyzing difference scores, impute the difference scores, since it's unlikely you will want to manually compute difference scores across all your imputed data sets. In other words, whatever variable(s) you are using in your actual analytic model, is/are the variable(s) that you should use in your imputation model.
- Again, I would impute with the transformed variable, since that is what is used in your analytic model.
- Adding variables to the imputation model will increase the computational demands of the imputation process, BUT, if you have the time, more information is always better. Variables with complete data could potentially be very useful auxiliary variables for explaining MAR missingness. If using all your variables results in too time/computation demanding of an imputation model (i.e., if you have a big data set), create dummy variables for each cases's missingness for each variable, and see which complete variables predict those missingness variables in logistic models--then include those particular complete case variables in your imputation model.
- I wouldn't report the original (i.e., list-wise deleted) analyses. If your missingness mechanism is MAR, then MI is not only going to give you increased power, but it will also give you more accurate estimates (Enders, 2010). Thus, the significant effect with MI might be non-significant with list-wise deletion because that analysis is underpowered, biased, or both.
References
Edwards, J. R. (1994). Regression analysis as an alternative to difference scores. Journal of Management, 20, 683-689.
Enders, C. K. (2010). Applied Missing Data Analysis. New York, NY: Guilford Press.
I believe it should be the same.
Short answer: z-score standardization is a linear transformation and as such won't change the ratio that's the basis of the T-test.
Long:
The basic formula for the independent two-sample T-test is:
$$
t = \frac{\bar{X}_{1} - \bar{X}_{2}}{s_{p}\times\sqrt{\frac{2}{n}}}
$$
If you did the z-score standardization, but have not changed the data otherwise. It is obvious that $\sqrt{\frac{2}{n}}$ is unchanged. So we just need to make sure that the ratio between the numerator and the denominator is unchanged too.
Let's start with the denominator. The pooled standard deviation $s_p$ is:
$$
s_{p} = \sqrt{\frac{1}{2}\times(\sigma_{x_1}^{2}+\sigma_{x_2}^{2})}
$$
Where $\sigma$ is the variance of the group:
$$
\sigma_{x_{1}}^{2} = \frac{\sum_{i=1}^{n}{(x_{i}-\bar{X_{1}})^{2}}}{{n-1}}
$$
It can be assumed again that $n$ haven't changed. How much the sum part have changed due to standardization? For that let's look at the z-score formula:
$$
z = x-\bar{x} \times \frac{n-1}{\sum{x-\bar{x}}}
$$
That's a transformation that we apply to every element in our initial dataset.
The critical parts are $x_i - \bar{x}$ from here and $\bar{X_1} - \bar{X_2}$ from the t-stat formula, as $n$ is unchanged. What we need to make sure essentially - to prove that the t statistics is the same - that these expressions have the same ratio in the initial and the z-score case. This can be proven by showing that the ratio of the mean and the individual values (the relative distance from the mean) is unchanged after the z-score transformation. Essentially:
$$
\frac{x_{i}}{\bar{x}} = \frac{z_{i}}{\bar{z}}
$$
and this equation holds (see proof below) - the z-score doesn't change the relative distance between the values and the mean, actually it shows the distance from the mean in $\sigma$ units. Even if the actual values of the $\sigma$s will change their relative position to each other won't. That's kind of the point of the standardization - keep the distances, but lose the original level.
So back to the original t-statistic:
$$
t = \frac{\bar{X}_{1} - \bar{X}_{2}}{s_{p}\times\sqrt{\frac{2}{n}}}
$$
As individual values keep a relative distance from the mean, $\bar{X_1} - \bar{X_2}$ will be different from $\bar{Z_1} - \bar{Z_2}$, but as we've changed the pooled standard deviation (because of $x_i - \bar{X_1}$) with the same scale once we move into calculating relative measures we end up with the same results.
Proof:
$$
\frac{x_{i}}{\frac{\sum{x_{i}}}{n}} = \frac{ \frac{x_{i}-\bar{x}}{\sigma} }{\frac{\sum{\frac{x_{i}-\bar{x}}{\sigma}}}{n}}
$$
$$
\sum{\frac{x_i-\bar{x}}{\sigma}} \times \frac{x_i}{\sum{x_i}} = \frac{x_i-\bar{x}}{\sigma}
$$
$$
\frac{x_{i}}{\sigma}\times\sum{x_i-\frac{x_i}{\sum{x_i}}}\times\frac{1}{\sum{x_i}} =
\frac{x_{i}}{\sigma}(1-\frac{1}{\sum{x_i}})
$$
$$
\frac{\sum{x_i-\frac{x_i}{\sum{x_i}}}}{x_i} = 1 - \frac{1}{\sum{x_i}}
$$
where simplifying the LHS leaves us with
$$
1 - \frac{1}{\sum{x_i}} = 1 - \frac{1}{\sum{x_i}}
$$
Thus proving that:
$$
\frac{x_{i}}{\bar{x}} = \frac{z_{i}}{\bar{z}}
$$
Best Answer
Yes, this looks fine. You are welcome to pre-process your data any way you see fit. Your selections look uncontroversial, but even if they were controversial, there is no formal reason why you can't do it. Linear transformations of raw data are ubiquitous (z-score, etc.); non-linear transformations of raw data are common ($log(x), \sqrt{x}, x^2$); wildly non-linear transformations of raw data are acts of desperation and will usually prove to be useless, but they still aren't "wrong." Anyway, you aren't anywhere near that domain with your suggestions. :)
No, you do not need to transform $(b+c)/2$ to a z-score. There is also no reason you should be forbidden from doing the same. The transformation is linear, so it creates a shift and a scaling along that axis, and your regression coefficients $\beta_0$ and $\beta_{(b+c)/2}$ will respond to this transformation, but the overall quality of your regression will not change. To emphasize this point, I might add that you could replace $(b+c)/2$ with $log((b+c)/2)$. This is clearly a different model, so I would expect a different $R^2$ -- maybe better! maybe worse! -- but again, there is no one stopping you from doing that, either. See point 1.
The only change in interpretation would be the words you use to describe the regression coefficients. "One unit change in the [ value~of | z-score of ] of $(b+c)/2$ would change the response variable by $[\beta_{(b+c)/2} ~|~ \beta_{Z_{(b+c)/2}}]$, all other variables being held constant."