From the comments, you're confident that your in a MAR or MCAR situation. Then multiple imputation is at least reasonable. So how much missingness is tractable? Think of it this way:
Basically, multiple imputation makes all your model parameter estimates less certain as a function of the accuracy with which the missing data can be predicted with your imputation model, which will depend, among other things, on the amount of missing that needs imputing, and the number of imputations you use.
How much is 'too much' missingness therefore depends on how much added variance/uncertainty you are willing to put up with. A useful quantity for you might be the relative efficiency ($RE$) of an MI analysis. This depends on the 'fraction of missing information' (not the simple rate of missingness), usually called $\lambda$, and the number of imputations, usually called $m$, as $RE \approx 1/(1+\lambda/m)$.
Rather than generate the definitions of missing information etc. here, you might simply read the MI FAQ which puts things very clearly. From there you'll know whether you want to tackle the original sources: Rubin etc.
Practically speaking you should probably just try an imputation analysis and see how it works out.
Your linear regression can't predict on the missing data if it doesn't have a predictor. So your value is not imputed.
Although it does involve regressions, Multivariate Imputation by Chained Equations (MICE) is a bit different from your linear regression approach. In a nutshell, missing variables are first tentatively filled, which makes them suitable as predictors, and then they are iteratively imputed. I would suggest looking at the pseudocode in Azur, M. J.; Stuart, E. A.; Frangakis, C. & Leaf, P. J. (2011) Multiple Imputation by Chained Equations: What is it and how does it work?. International journal of methods in psychiatric research, 20, 40-49 to understand what the algorithm does.
Best Answer
The following are my thoughts on the subject (as per your questions):
I think that the answer to this is: it depends. Some models (or, more accurately, software that implements those models) can handle missing data automatically, due to implemented algorithms of either handling missing data per se, or embedding multiple imputation or similar methods into the modeling software (usually, functions, i.e. in
R
). Therefore, you need to carefully read the software's documentation to see what missing data handling features it offers to the user.Another important point in finding the correct or optimal answer is determining (testing assumptions about) the nature/mode of missingness. I'm talking about MCAR, MAR, MNAR - for more details on this and, in general, for a comprehensive overview of the topic as well as approaches, methods and software for missing data handling, see the excellent paper by Horton and Kleinman (2007).
I have not seen any common thresholds for this. Your example above is an extreme case and does not represent most of real data sets. Moreover, even a small level of missingness (say, several percentage points) in many variables might produce significant overall missingness in the model: "... missingness of just a few percent on each of a number of covariates may lead to a large number of observations with some missing information" (Horton & Kleinman, 2007, p. 79).
As far as I know, in most cases it is OK to use categorical variables as is, of course, assuming that the software you're using supports that. Most software indeed has direct support of categorical variables - see paper by Horton and Kleinman (2007) for details. Perhaps, there exist some situations, when it would be beneficial to convert them, but as of now I'm not aware of such.
To the best of my knowledge, this is not true. Both
Amelia
andmice
provide functionality for aggregating the imputed results and even performing some types of statistical analysis. Even more integrated process can be found in theR
-based Zelig software, which supports various statistical models and has an embedded support for missing data handling (viaAmelia
package).NOTE: Keep in mind that
Amelia
, in addition to traditional MAR assumption, also has an assumption that the data you're trying to process is multivariate normal. So, if it is not the case, other options should be considered, such asmice
or correspondingHmisc
functionality.References
Horton, N. J., & Kleinman, K. P. (2007). Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. The American Statistician, 61(1), 79–90. doi:10.1198/000313007X172556 Retrieved from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1839993