Solved – Estimate range of values of continuous variable corresponding to every level of discrete variable

continuous datadiscrete data

This might seem a silly question, but I have Googled in vain for hours to find an answer, so here goes:

I have two variables measuring the same physical parameter. Let's call these variables A and B. A is a discrete variable that can go from 0 to 5, and B is a continuous variable that normally ranges from 0 to 1000. However, A is a very noisy variable.

Based on the data that I have, I want to estimate the range of values in B that corresponds to every value in A.

Example output:

╔═══╦══════════╗  
║ A ║    B     ║  
╠═══╬══════════╣  
║ 0 ║ 0-50     ║  
║ 1 ║ 50-200   ║  
║ 2 ║ 200-500  ║  
║ 3 ║ 500-750  ║  
║ 4 ║ 750-800  ║  
║ 5 ║ 800-1000 ║  
╚═══╩══════════╝  

How do I estimate these ranges? Any help would be greatly appreciated.

Best Answer

The problem you have might be interpreted as a classification one. You want to find a classifier in form of the intervals, we obtain what is needed. To construct them, you have to construct limits $\beta_k$ where $k=1,\dots,v-1$ where $v=6$ is number of elements of $A$.

Each limit shall be constructed as $f(b=\beta_k|a=k)=f(b=\beta_k|a={k+1})$ where $f(\cdot|\cdot)$ stands for conditional probability density function. These density functions can be approximated e.g. by normal distributions.

Simple code in Matlab:

v = 6;
Ndata = 1000;
a = randi(v,Ndata,1);
b = a+0.5*randn(Ndata,1);
allB = min(b):0.01:max(b);
figure
hold on;
for i = 1:v
   m(i) = mean(b(a==i));
s(i) = std(b(a==i));
c = rand(3,1);
pdfVals = normpdf(allB,m(i),s(i));
plot(allB,pdfVals,'Color',c);
lgd{i} = ['A = ' num2str(i)];
if i>1
    beta(i-1) = allB(find(pdfVals-pdfValsOld>0,1));
end
pdfValsOld = pdfVals;
end

for i=1:(v-1)
plot([beta(i) beta(i)],ylim,':');
end
lgd{end+1}='betas';
legend(lgd)

enter image description here

Related Question