Hi all,
Please help me. Please. This research student will thank you immensely.
Here's the scenario: I have two populations (main and sub). The sub population is 10% of the main population. The mean (mu) and sigma(std) of these two populations can change, but have certain predefined boundaries. Right now, I am only changing the sigma of the main population and keeping the sigma of the sub population, and the mean of both populations constant.
What I want: I want to use fitgmdist/GMModels/Gaussian distribution to be able to identify that there are indeed two components in that set of data. In other words, I am trying to see for what sigma_main values does the model ouput two components (numComponents = 2). Right now, I have a model that iterates through my sigmamain values (0.1:0.05:1) and goes through varying GMModels to find the one with the better AIC (least error) and outputs the numComponents for every sigma_main value, and then sigma_main vs numComponents is plotted. At least, that's what I think.
Here's my code:
%% define known constants
% rng(0,'twister')
n_main = 100000;n_sub = 10000;%sigma_main = 0.1;
sigma_sub = 0.1;mu_main = 1;mu_sub =3;%% change sigma_main values and run model
sigmamain_val = 0.1:0.05:1; outputData = zeros(length(sigmamain_val), 2); % create an output array for (length sigma_sub),numComponents
for i = 1:length(sigmamain_val) sigmamain_pos = randi(length(sigmamain_val)); sigma_main = sigmamain_val(sigmamain_pos); y = sigma_main.*randn(n_main,1) + mu_main; %10^6 SKBr3 cells
y2 = sigma_sub.*randn(n_sub,1) + mu_sub; %100,000 MDA MB 231
C = cat(1, y, y2); AIC = zeros(1,4); % create an ouput array for AIC
GMModels = cell(1,4); options = statset('MaxIter',00); % add optitons here to better/refine the model
for k = 1:4 GMModels{k} = fitgmdist(C,k); AIC(k)= GMModels{k}.AIC; end [minAIC,numComponents] = min(AIC); outputData(i,1) = sigma_main; outputData(i,2) = numComponents; %BestModel = GMModels{numComponents}
BestModel = GMModels{outputData(i,2)}; end%% plot sigma main vs numComponents
figure(3)plot(outputData(:,1),outputData(:,2)) xlabel('sigma main')ylabel('numComponents')
Btw: the idea for this model with AIC is based off: https://www.mathworks.com/help/stats/fitgmdist.html
Here's the problem:
For starters: The code for y & y2, which are for my main and sub population respectively, only take into account the very last sigma value.
A bigger problem is that sometimes, my code picks a sigma_main value twice (i.e 0.8) and the number of components is sometimes different even though they are the same sigma_main value.
Also, I would like to ouput a "BestModel" or 1×1 gmdistrinution for every sigma_main value instead of just the last one and I don't know how to incorporate that in my for loop.
What I think: I need to go through sigma_main values in order (i.e only write "sigma_main = i" inside the for loop) so that values don't get repeated, but then my y and y2 cannot b concatenated (C = cat(1, y,y2) because my sigma_main value will be 19×1 instead of just one number.
To outut a BestModel for every sigma_main, I need to create an empty/zeroes array before but I don't know ehat type to do that for.
Please, please help me. I have stayed up two nights doing this and I really don't know how to move forward. It might be really simple, but I might be overthinking it.
Best Answer