I am currently trying to learn how to estimate the kernel density using the Epanechnikov kernel in MATLAB, and I am currently having problems with my code. What I'm doing is that the data that I am simulating comes from a mixture of normals. When I tried to estimate it using a Gaussian kernel, the code worked. However, when I used an Epanechnikov kernel, the graph I am obtaining is wrong.
The code is attached here:
% Kernel density estimation using mixtures of normally distributed random
% variables.
% Set parameters for data generating process.
n=1000; % Number of observations.
data1 = rand(n,1); % Returns data with 1000 observations.
% Initializing the zero matrix of data from the mixture.
data2=zeros(n,1);
% % Generate the data. (mixture of normal random variables)
data2 = (data1 < ones(n,1)/3).*normrnd(2,1,n,1) + ...
(data1 > ones(n,1)/3).*normrnd(-2,1,n,1);
% Generating the density (using an Epanechnikov kernel.)
% Setting the parameters.
s=std(data2); % standard deviation of the data
h=2.34; % bandwidth parameter
% Evaluate the kernel at all x's in the domain.
x=linspace(-1,1,1000);
p=ones(n,1);
xi=x*p; % matrix for the x's where we evaluate the kernel.
data2i=data2*ones(1,n); % matrix for each of the data points in the kernel.
u=(xi-data2i)/h; % matrix of u's.
absu=abs(u); % absolute value of u.
I=(absu<=1);
f=(.75/h)*(1-u.^2).*I;
plot(x,f);
May I ask what I might be doing wrong?
Best Answer
Here is the rectified code - you basically need to check the matrix multiplications to make sure that the outputs have correct dimensions. for example,
will yield xi = 0 and so on. There were also some minor bugs like, setting the proper bandwith, taking the sum at each data points etc, which I think I have fixed. Please see below for the correct code and the resulting figure:
Needless to say, neither code is optimized - which of course was not the intention either.