Solved – Kernel density estimation with an Epanechnikov kernel in MATLAB

kernel-smoothingMATLAB

I am currently trying to learn how to estimate the kernel density using the Epanechnikov kernel in MATLAB, and I am currently having problems with my code. What I'm doing is that the data that I am simulating comes from a mixture of normals. When I tried to estimate it using a Gaussian kernel, the code worked. However, when I used an Epanechnikov kernel, the graph I am obtaining is wrong.

The code is attached here:

% Kernel density estimation using mixtures of normally distributed random
% variables.

% Set parameters for data generating process.
n=1000; % Number of observations.
data1 = rand(n,1); % Returns data with 1000 observations.

% Initializing the zero matrix of data from the mixture.
data2=zeros(n,1);

% % Generate the data. (mixture of normal random variables)
data2 = (data1 < ones(n,1)/3).*normrnd(2,1,n,1) + ...
    (data1 > ones(n,1)/3).*normrnd(-2,1,n,1);

% Generating the density (using an Epanechnikov kernel.)
% Setting the parameters.
s=std(data2); % standard deviation of the data
h=2.34; % bandwidth parameter

% Evaluate the kernel at all x's in the domain.
x=linspace(-1,1,1000);
p=ones(n,1);
xi=x*p; % matrix for the x's where we evaluate the kernel.
data2i=data2*ones(1,n); % matrix for each of the data points in the kernel.
u=(xi-data2i)/h; % matrix of u's.
absu=abs(u); % absolute value of u.
I=(absu<=1);
f=(.75/h)*(1-u.^2).*I;

plot(x,f);

May I ask what I might be doing wrong?

Best Answer

Here is the rectified code - you basically need to check the matrix multiplications to make sure that the outputs have correct dimensions. for example,

 x=linspace(-1,1,1000); p=ones(n,1); xi=x*p;

will yield xi = 0 and so on. There were also some minor bugs like, setting the proper bandwith, taking the sum at each data points etc, which I think I have fixed. Please see below for the correct code and the resulting figure:

% Kernel density estimation using mixtures of normally distributed random % variables.
% Set parameters for data generating process. 
n=1000; % Number of observations. 
data1 = rand(n,1); % Returns data with 1000 observations.
% Initializing the zero matrix of data from the mixture. 
data2=zeros(n,1);
% % Generate the data. (mixture of normal random variables) 
data2 = (data1 < ones(n,1)/3).*normrnd(2,1,n,1) + ... 
 (data1 > ones(n,1)/3).*normrnd(-2,1,n,1);
% Generating the density (using an Epanechnikov kernel.) 
% Setting the parameters. 
s=std(data2); % standard deviation of the data 
h=1; % bandwidth parameter
% Evaluate the kernel at all x's in the domain. 
x=linspace(-3,3,1000); p=ones(n,1); xi=p*x; 
% matrix for the x's where we evaluate the kernel.
data2i=data2*ones(1,n); % matrix for each of the data points in the kernel. 
u=(xi-data2i)/h; % matrix of u's. 
absu=abs(u); % absolute value of u. 
I=(absu<=1); 
f=(.75/h)*(1-u.^2).*I;
ff = sum(f,1); 
plot(x,ff);

enter image description here

Needless to say, neither code is optimized - which of course was not the intention either.