Solved – How to compare models from different but related datasets

bootstrapmachine learningmathematical-statisticsmixed modelregression

I'm building regression models on four the different but related data set and at the end, I want to test the significance of models. Since my models are built in a different data set, it's not comparable. But there are some hierarchy in my dataset.

1) The output in my main dataset is the sum of the outputs of data set A, B and C

2) The value of feature is the same in all data sets, but all features are NOT present in all data set and some data sets only having a subset of the features of the main data set.

3) The union of the features of the dataset A, B, C are the features of the main data set.

So, I want to build a regression model on these data sets seprately and compare the performance and significance of the models togethers.Becuase the dataset are not the exactly the same for all four models, I can not use standard statstics to test the sigificance of models like AIC, CP-statstics,…..
Is there anyway to compare the performance of the model ?

Here is my dataset(I just included first two data points) for more clarification:

enter image description here

Best Answer

I'm not sure how to interpret the diagram, but for asymmetric/messy problems the bootstrap is often your friend. Suppose that you want to get a confidence interval on the difference between $R^2$ from two different models on two different or overlapping datasets, and that the number of independent experimental units for the two is $n_{1}, n_{2}$ with there being $n_{u}$ unique experimental units from the union of the two samples. You could sample with replacement from the $n_{u}$ units 1000 times, each time recreating dataset 1 on the basis of which and how often the $n_{1}$ units were selected, and likewise for dataset 2. For each resample estimate two sets of model parameters and two $R^2$ and their difference. Get a bootstrap confidence interval for the difference using the 1000 estimated differences.

Related Question