Hot-deck imputation is one of the multiple methods for imputing missing data (you can also impute missing data using cold-deck i.e. information from external sources, unconditional or conditional mean, predictions from some model, random draws from assumed distribution, and in many other ways as described e.g. by Gelman and Hill, 2007, or Little, 1992 for accessible reviews).
The idea of hot-deck method is simple: if you have missing value in multivariate dataset you replace such value with non-missing one from another case. The donor case may be chosen at random, or by choosing case that is similar to the one with missing data.
If you are dealing with discrete variables, then for calculating similarity between cases you can use affinity scores defined as
$$ \alpha_{ij} = \frac{k - q_i - z_{ij}}{k - q_i} $$
where $k$ is number of variables, $q_i$ is number of missing values in $i$-th case, $z_{ij}$ is number of variables for which the potential donor $j$ and the recipient $i$ have different values.
In case of continuous, or discrete with multiple categories variables, you need to decide how to measure distance between vectors of values since you would not be expecting exact matches. The most simple approach is to match with some tolerance $\varepsilon_p$
$$ \alpha_{ij} = \frac{k - q_i - \sum_{p=1}^k \left[ |x_{ip} - x_{jp}| > \varepsilon_p \right] }{k - q_i} $$
Among other choices are multiple popular distance metrics (this is problem-specific!) as mentioned briefly by Cranmer and Gill (2013). Moreover, to deal with missing cases when calculating distance you can first use mean imputation to replace the missing cases and then calculate full-data distances.
Based on distance metric, donor can be chosen by choosing the single one within smallest distance, or at random from group of best donors.
In many cases you wouldn't have single best match, or it wouldn't be reasonable to stick to the single best match, so wiser idea is to use multiple hot-deck imputation, i.e. replicate your dataset with several copies and in each of them randomly assign values from different "best" donors. In the end you would calculate your statistic of interest on all of the datasets and take arithmetic mean of those estimates as your final estimate.
Another question that you can ask yourself is if you are going to re-use donors if they match multiple missing-data cases. As described by Joenssen and Bankhofer (2012), p. 75):
Under some situations, donor limitation leads to better parameter
estimations. Splitting the data into a low amount of imputation
classes leads to better estimation of variance and quartile distance
for quantitative and ordinal variables, respectively. For low amounts
of objects per imputation class the variance of quantitative variables
is estimated better with a donor limitation, while binary variables
with many objects per imputation class also profit from a donor limit.
This is also the case for data matrixes with high amounts of
missingness. Estimation of location, such as mean and median are not
influenced by limiting donor usage.
You can check also Reilly (1993) for theoretical discussion of multiple hot-deck imputation.
Cranmer, S. J., & Gill, J. (2013). We have to be discrete about this: A non-parametric imputation technique for missing categorical data. British Journal of Political Science, 43(02), 425-449.
Little, R. J. (1992). Regression with missing X's: a review. Journal of the American Statistical Association, 87(420), 1227-1237.
Gelman, A. and Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.
Joenssen, D. W., & Bankhofer, U. (2012, July). Hot deck methods for imputing missing data. In International Workshop on Machine Learning and Data Mining in Pattern Recognition (pp. 63-75). Springer Berlin Heidelberg.
Reilly, M. (1993). Data analysis using hot deck multiple imputation. The Statistician, 307-313.
See Kontopantelis et al. (2017), who describe the proper way to handle this situation. You should definitely retain the DV in the imputation model and use it to impute the predictors. You should use the predictors to impute the values of the DV. What the paper demonstrates is that it doesn't really matter whether you retain the individuals who originally had missing values for the DV or discard them. To me, it's preferable to retain them to keep your sample size larger.
Kontopantelis, E., White, I. R., Sperrin, M., & Buchan, I. (2017). Outcome-sensitive multiple imputation: A simulation study. BMC Medical Research Methodology, 17(1). https://doi.org/10.1186/s12874-016-0281-5
Best Answer
Hot deck is often a good idea to obtain sensible imputations as it produces imputations that are draws from the observed data. However, filling in a single value for the missing data produces standard errors and P values that are too low. For correct statistical inference could use multiple imputation. It is easy to apply hot deck imputation in combination with multiple imputation. The most popular technique for doing this is known as predictive mean matching, and has been implemented on a variety of platforms.