MATLAB: Selecting tables among 540 tables, based on a start and end date

cellMATLABtable

Hello all,
I have an x.mat cell which includes over 540 tables for 540 climate stations. In every 540 tables, there is a column named data (data is a wrong variable name for date). I want to have a code that selects every table that has a specific period from 1989 to 2018 and saves them in a new cell named x_selected.
I want to save the table if the start (1989) and end time (2018) are satisfied in the tables even if other years in between doesn't exist in it.
the following examples describe what I want:
for example, if one table (station) has only data from 2000 to 2018 so it is not going to save in x_selected because it hasn't data from 1989 to 2018. But if a table (station) has data for 1989 and 2018 (and not in between these two years) I want to save it in x_selected.
if a table has data for 2000 to 2003 I don't want it because the start time isn't 1989 and the end time isn't 2018.
if a table has data only for 2015 I don't want it because the time range 1989-2018 is not considered.
if a table has data from 1989 to 2018 but missed some years (like 2015, 2017) I want to save it because start time and end time are satisfied.
the above year is for example and may be different in the tables I just write them for example, what's important is start time (1989) and end time(2018).
x.mat is attached.
Thank you so much.

Best Answer

Here's my thought process as I explore the data and solve the problem in parallel.
Please take time to go through each line and think about what each line is doing. The comments should help.
You'll see two interpretation of the "select" variable (one of them is commented-out). Choose whichever one fits what you need.
% Clear out workspace, load variable(s)
clear; close all
load('x.mat')
% what do these tables look like?
% show first few rows of 1st table
head(x{1})
% are the dates in datetime format?
% Test the first date in the first table
isdatetime(x{1}.data(1))
% Check for missing values in data column
any(cellfun(@(T)any(ismissing(T.data)),x)) % = 0 means none
% Identify start and stop years
yrBounds = [1989, 2018];
% Get the minimum and maximum years of each table.
% Note that your dates are *not* sorted.
% This cellfun extracts the min value of each data column
% from each table. That produces a vector of datetime values.
% Then the year() function extracts the years.
startYear = year(cellfun(@(T)min(T.data),x));
% now get the max year value
endYear = year(cellfun(@(T)max(T.data),x));
% Identify which startYear and endYear satisfy our bounds.
% Use this line if you want tables whose dates span from 1989:2018

select = startYear==yrBounds(1) & endYear==yrBounds(2);
% Use this line if you want tables whose dates span from 1989:2018
% but may have earlier or later years, too.
% select = startYear<=yrBounds(1) & endYear>=yrBounds(2);
% How many tables were selected?
sum(select)
% Extract the selected tables
x_selected = x(select);
% Check for accuracty. Here we'll select a table to look at
% (table #n) and we'll list the unique years in the data column.
n = 1; % which table to look at
unique(year(x_selected{n}.data))