MATLAB: Merging CSV files with similar data structure into table

.csv filedata structurereadtabletable

Merging CSV files into table
I am trying to find a way to read each csv file with read table and merge into a big table without “heavy” coding. I am having problems merging the tables due to a somewhat non-uniform data structure and wonder if the process of making the table uniform can be done in Matlab (instead of copy pasting in excel).
Questions:
How to I check if variable exist in csv file?
How do I concatenate specific variables to existing table?
Problem: The original data are in csv files and organized as follows:
There is one csv file for each year
The data structure is not the same in each csv file. Eg file2001.csv may have variables “height” only; file 2005.csv may have variable “height” and “colour” and 20015.csv may have “height”, “size” and “colour”.
Generic Code:
Table2001=readtable(file2001.csv); % contains subset of variables

Table2005=readtable(file205.csv); % contains subset of variables
Table2015=readtable(file2015.csv); % contains all variables)
Create big table
T = Table2015; % starting table as it contains all variables
Check if “height” is a variable in Table2001;
if yes, concatenate column below T.height
if no, add column of NAN below T.height
continue process for all variables and files

Best Answer

Bjorn, you can create a table with three variables, full of NaNs, and then just assign the variables you have read in from each csv. Something like this:
Create an example of a table that might have been read in using readtable:
>> t2 = array2table(randn(5,2),'VariableNames',{'height' 'color'})
t2 =
height color
________ _______
-0.20497 0.6715
-0.12414 -1.2075
1.4897 0.71724
1.409 1.6302
1.4172 0.48889
Preallocate a table of missing values, with the right size:
>> t1 = array2table(nan(height(t2),3),'VariableNames',{'height' 'size' 'color'})
t1 =
height size color
______ ____ _____
NaN NaN NaN
NaN NaN NaN
NaN NaN NaN
NaN NaN NaN
NaN NaN NaN
Move the data that was read in:
>> t1(:,t2.Properties.VariableNames) = t2
t1 =
height size color
________ ____ _______
-0.20497 NaN 0.6715
-0.12414 NaN -1.2075
1.4897 NaN 0.71724
1.409 NaN 1.6302
1.4172 NaN 0.48889
Do that three times, and the three results can be vertically concatenated, Another possibility:
>> t2 = array2table(randn(5,2),'VariableNames',{'height' 'color'})
t2 =
height color
________ _______
1.0347 0.8884
0.72689 -1.1471
-0.30344 -1.0689
0.29387 -0.8095
-0.78728 -2.9443
>> t2{:,setdiff({'height' 'size' 'color'},t2.Properties.VariableNames)} = NaN
t2 =
height color size
________ ________ ____
1.4384 -0.10224 NaN
0.32519 -0.24145 NaN
-0.75493 0.31921 NaN
1.3703 0.31286 NaN
-1.7115 -0.86488 NaN