MATLAB: How to use the JOIN functionality in MATLAB 7.8 (R2009a)

MATLABStatistics and Machine Learning Toolbox

I am running into many errors trying to use the JOIN functionality and would like to know what these errors mean and how I can resolve them.

Best Answer

The following are examples of commonly seen errors when using the command JOIN with datasets. The same examples can be viewed in err_examples.m as seen under 'Resolution Documents'.
The examples use the following datasets:
%%Make datasets
patients = dataset('file','hospital.dat',...
'delimiter',',',...
'ReadObsNames',true);
patients1 = patients([1:10],:)
patients2 = patients([21:30],:)
1) Trying to join patients1 and patients2 by 'name':
join(patients1,patients2,'name')
gives the following error:
??? Error using ==> dataset.join>simplejoin at 346
The key variable for B must have contain all values in the key variable for A.
Error in ==> dataset.join at 251
ir = simplejoin(leftkey,rightkey);
We receive this error because all the names in patients1 need to be in patients2.
The order of inputs into JOIN is important. Without specifying any type
(left, right, inner, outer), we are saying that we are incorporating the
second inputted dataset into the first. So JOIN wants to match the key,
in this case 'name' in patients2 to every name in patients1.
2) Trying to join patients1 and patients2 by VarName 'sexes'
join(patients1,patients2,'sexes')
gives the following error:
??? Error using ==> getvarindices at 25
Unrecognized variable name 'sexes'.
Error in ==> dataset.join at 143
leftkey = getvarindices(a,leftkey);
We receive the 'Unrecognized variable' error because the key passed into JOIN isn't a valid VarName in
both the datasets. To check for a dataset's VarNames use the following commands:
patients1.Properties.VarNames
patients2.Properties.VarNames
3) Since the correct VarName is called 'sex', not 'sexes', we run the following code to to join patients1 and patients2 by the VarName'sex'.
join(patients1,patients2,'sex')
receive the following error:
??? Error using ==> dataset.join>simplejoin at 327
The key variable for B must have unique values.
Error in ==> dataset.join at 251
ir = simplejoin(leftkey,rightkey);
We receive the above error because whenthe second dataset is being incorporated into the first, all the key in the second dataset need to have unique values. There is a column for 'sex' in both the first and second datasets. The first dataset contains many rows, some that take on the sex 'm' for male, and others that take on the sex 'f' for female.
The error indicates that to join two datasets based on the VarName 'sex', the second datset can have only two rows in it, one that specifies values for 'm' and one for 'f'. Any other column (varname) in patients2 will be appended to the first dataset. Columns in the second datset with the same VarNames as the columns in the first dataset will get a tag ('_left') added to the name. To see the result of this, try the following code which takes out just two rows of patients2, one that specifies the NAME, WEIGHT,...etc for 'm' and 'f':
join(patients1,patients2([1,6],:),'sex')