R GLM – How to Handle NAs in GLM Without Removing Rows in R

generalized linear modelmissing datar

I want to make my first glm but in some of the variables I use are NA's. I can't find the right information about how to clean these columns so it can be used in glm.
Is it possible to handle this at once when making a model in the glm function? Or is it necessary to do it in advance for every column.
And what is the best way for doing this without removing entire rows in the dataframe? By using
na.omit() it removes all incomplete cases of a data object but I only want to ignore these NA's when modelling.

I found this one usefull for integer variables

df %>% mutate_all(~ifelse(is.na(.x), mean(.x, na.rm = TRUE), .x))  

But what about numeric variables like 1 = male and 2 = female?

I think it is really a beginners issue, not that exciting for most of you I think.. But hopefully someone can help me.

Thanks in advance!

Best Answer

Missing information (NA) in variables is quite tricky to handle. First of all, NA values will be omitted. It is not possible to fit a regression model using NA values, so you have to handle them before fitting the glm model.

There are a lot of approaches, the easiest ones are omitting the rows with NAs (what glm does by default), imputing the NA values with the most frequent, the one from the previous row or the median/mean (those are called single imputation methods) or other more complex approaches that use two or more variables at once to get the right values (multiple imputation methods).

The solution will depend always on the context of the data. And you have to know that the existence of missing data will mean an error or bias in your results. So it is important to try to reduce this bias in the glm model.

For a bit more information, for example you can have a look to this post: https://www.kdnuggets.com/2017/09/missing-data-imputation-using-r.html#:~:text=In%20R%2C%20there%20are%20a%20lot%20of%20packages,and%20probably%20a%20gold%20standard%20for%20imputing%20values.

Related Question