MATLAB: How does regress deal with NaN

MATLABnanregressregression

Hello. I have a question about how the regress function deals with NaN. I know that it handles them as missing values and ignores them but am wondering more specifically how this is done. mainly if i have a dataset containing a number of variables, say 4, and like 50 points for each of these. does it remove the rows for all the variables where only one is missing a value and thus keeping the columns the same lenght or does it somehow keep all the information that is in the dataset?

I hope i managed to make what i am asking clear. It was a little bit hard for me to formulate the question.

Best Answer

Type this on your command window:

open regress

If you scroll down to line 65 (might be a bit different depending on your version of Matlab), you'll see how regress deals with NaNs:

% Remove missing values, if any
wasnan = (isnan(y) | any(isnan(X),2));
havenans = any(wasnan);
if havenans
   y(wasnan) = [];
   X(wasnan,:) = [];
   n = length(y);
end

You can see that regress removes the entire row of X, if either one or more of the entries in that row is NaN or if the corresponding output y is NaN. This is the correct way to handle missing values -- if you do not know the value of one of the predictors, you have to throw away the entire observation.

Related Solutions

MATLAB: Correlation between two row matrices

Like that, each value of "a" is correlated to each value of "b", but applying the formula of the correlation, the correlation of two single numbers is NaN. To compute the correlation correctly, traspose the input vectors

result  = corr(a', b');

MATLAB: Removing NaN in Linear Regression Problem. Error in line 66.

You don't pass the table to regress but the variables to be used in the regression -- then you won't run into the issue inside regress.

And you DEFINITELY DO NOT WANT TO BE MUCKING INSIDE THE SUPPLIED REGRESS FUNCTION!!!!

We don't know the function you're trying to fit nor the variable names in your table, but assuming

Y ~ 1 + AX1 + BX2 + ...

for variables X and Y in the table and a linear model plus intercept, then the syntax for regress would be

b=regress(t.X,[ones(height(t),1) t.Y]);

where the table variable is t. Use your table variable name and variable names within the table, of course.

If you have the Curve Fitting Toolbox besides Statistics, I would suggest that the fit function in it is a little more user friendly than the core regress function. Lacking it, see the Alternative Functionality section of the documentation for regress that suggests using LinearModel instead for similar reasons/purposes.

Read the section in the documentation for table on how to address data within a table for the details of using tables and which forms of addressing return the variables as native type, tables, etc., ... But, in particular note that addressing a table variable with parentheses returns another table of the addressed rows and columns within the table which is probably the root cause of your troubles.

x=t(:,1);       % returns x as a table all rows of table t, column 1

while

x=t.X;          % presuming X  is the first column in table t returns X as an array
% or
x=t{:,1};       % returns x as a array -- NB: the "curlies" {} instead of ()

Best Answer

Related Solutions

MATLAB: Correlation between two row matrices

MATLAB: Removing NaN in Linear Regression Problem. Error in line 66.

Related Question