MATLAB: How does regress deal with NaN

MATLABnanregressregression

Hello. I have a question about how the regress function deals with NaN. I know that it handles them as missing values and ignores them but am wondering more specifically how this is done. mainly if i have a dataset containing a number of variables, say 4, and like 50 points for each of these. does it remove the rows for all the variables where only one is missing a value and thus keeping the columns the same lenght or does it somehow keep all the information that is in the dataset?
I hope i managed to make what i am asking clear. It was a little bit hard for me to formulate the question.

Best Answer

Type this on your command window:
open regress
If you scroll down to line 65 (might be a bit different depending on your version of Matlab), you'll see how regress deals with NaNs:
% Remove missing values, if any
wasnan = (isnan(y) | any(isnan(X),2));
havenans = any(wasnan);
if havenans
y(wasnan) = [];
X(wasnan,:) = [];
n = length(y);
end
You can see that regress removes the entire row of X, if either one or more of the entries in that row is NaN or if the corresponding output y is NaN. This is the correct way to handle missing values -- if you do not know the value of one of the predictors, you have to throw away the entire observation.