I'm working k-means clustering in MATLAB. My file has three coloumns and I have done the codes for clustering. And I need a function to measure the clustering quality, and I pick silhouette plot. I got the silhoutte code from here (and I want it shows like that): http://stackoverflow.com/questions/6644445/equivalent-of-matlabs-cluster-quality-function
And I fit it with my variables. So here it is the k-means clustering code:
load cobat.txt; % read the file
k=input('Enter a number: '); % determine the number of cluster
isRand=0; % 0 -> sequeantial initialization
% 1 -> random initialization
[maxRow, maxCol]=size(cobat);if maxRow<=k, y=[m, 1:maxRow];elseif k>7 h=msgbox('cant more than 7');else % initial value of centroid
if isRand, p = randperm(size(cobat,1)); % random initialization
for i=1:k c(i,:)=cobat(p(i),:) ; end else for i=1:k c(i,:)=cobat(i,:); % sequential initialization
end end temp=zeros(maxRow,1); % initialize as zero vector
u=0; while 1, d=DistMatrix3(cobat,c); % calculate the distance
[z,g]=min(d,[],2); % set the matrix g group
if g==temp, % if the iteration doesn't change anymore
break; % stop the iteration
else temp=g; % copy the matrix to the temporary variable
end for i=1:k f=find(g==i); if f % calculate the new centroid
c(i,:)=mean(cobat(find(g==i),:),1) end end end y=[cobat,g] %plot silhouette
s = mySilhouette(cobat, g) [~,ord] = sortrows([g s],[1 -2]); indices = accumarray(g(ord), 1:k, [K 1], @(x){sort(x)}); ytick = cellfun(@(ind) (min(ind)+max(ind))/2, indices); ytickLabels = num2str((1:K)','%d'); %#'
h = barh(1:N, s(ord),'hist'); set(h, 'EdgeColor','none', 'CData',IDX(ord)) set(gca, 'CLim',[1 K], 'CLimMode','manual') set(gca, 'YDir','reverse', 'YTick',ytick, 'YTickLabel',ytickLabels) xlabel('Silhouette Value'), ylabel('Cluster') %# compare against SILHOUETTE
figure, silhouette(cobat,g)
Here is the DistMatrix3 function (this is used to calculate the distance)
function d=DistMatrix3(A,B)[hA,wA]=size(A);[hB,wB]=size(B);if hA==1 & hB==1 d=sqrt(dot((A-B),(A-B)));else C=[ones(1,hB);zeros(1,hB);zeros(1,hB)]; D=[zeros(1,hB);ones(1,hB);zeros(1,hB)]; E=flipud(C); F=[ones(1,hA);zeros(1,hA);zeros(1,hA)]; G=[zeros(1,hA);ones(1,hA);zeros(1,hA)]; H=flipud(F); I=A*C; J=A*D; K=A*E; L=B*F; M=B*G; N=B*H; d=sqrt((I-L').^2+(J-M').^2+(K-N').^2); end
And here is the mySilhouette function code:
function s = mySilhouette(cobat, g) %# X : matrix of size N-by-p, data where rows are instances
%# IDX: vector of size N, cluster index of each instance (starting from 1)
%# s : vector of size N, silhouette score value of each instance
N = size(cobat,1); %# number of instances
K = numel(unique(g)); %# number of clusters
%# compute pairwise distance matrix
D = squareform( pdist(cobat,'euclidean').^2 ); %# indices belonging to each cluster
kIndices = accumarray(g, 1:N, [K 1], @(x){sort(x)}); %# compute a,b,s for each instance
%# a(i): average distance from i to all other data within the same cluster.
%# b(i): lowest average dist from i to the data of another single cluster
a = zeros(N,1); b = zeros(N,1); for i=1:N ind = kIndices{g(i)}; ind = ind(ind~=i); a(i) = mean( D(i,ind) ); b(i) = min( cellfun(@(ind) mean(D(i,ind)), kIndices([1:K]~=g(i))) ); end s = (b-a) ./ max(a,b); end
Here is cobat file:
65 80 5545 75 7836 67 6665 78 8879 80 7277 85 6576 77 7965 67 8885 76 8856 76 65
I run the code, but it's getting error for: "??? Undefined function or variable 'K'. Error in ==> clustere at 54 indices = accumarray(g(ord), 1:k, [K 1], @(x){sort(x)});"
I know that this is because of the K variable. But I don't have any idea what is K for. And I just can't figure it out. Anyone can help me to fix the error and make it works? You help will be much appreciated.
Thank you.
Best Answer