MATLAB: Question about Kmeans function

kmeansStatistics and Machine Learning Toolbox

Anyone can explain this, please?

[IDX,C] = kmeans(X,k,param1,val1)

here, 'start' is as param1, Matrix is as val. It is the method used to choose the initial cluster centroid positions.

Matlab help exaplained as: "k-by-p matrix of centroid starting locations. In this case, you can pass in [] for k, and kmeans infers k from the first dimension of the matrix."

Here is function I try to use: [IDX,C]=kmeans(data,[],'Distance','sqEuclidean','emptyaction','singleton','Start',data);

Question 1: is "data" that Matrix which help talked about? Question 2: if it is, the new problem coming as below "??? Error using ==> NaN Out of memory. Type HELP MEMORY for your options.

Error in ==> kmeans at 298 if online, Del = NaN(n,k); end % reassignment criterion"

In my case, dimension of data is 334795×2.

Best Answer

The 'start' parameter defines the initial centroid locations. As the help explains, it should be k-by-p where k is the number of groups you're splitting the data into. If you use the data matrix itself, then k is the same as the number of data points! That is, your asking kmeans to group n points into n groups! The result will be that each data point is in its own group. If you have 300,000 points, you'll also run out of memory, it seems.

This is what you're trying to do:

X = rand(20,2);
g = kmeans(X,[],'start',X)
gscatter(X(:,1),X(:,2),g)

This is what you should be doing:

g = kmeans(X,[],'start',[0.25,0.75;0.25,0.25;0.75,0.25;0.75,0.75])
gscatter(X(:,1),X(:,2),g)

Note that the data (X) is 20-by-2. The starting matrix is 4-by-2, so kmeans makes 4 groups out of the 20 points.

Related Solutions

MATLAB: How i can solve this problem? Pleas!

Your ‘T’ matrix must become a cell array. See my sample of ‘T’ for an example.

Once you do that, I would do something like this:

Y = [12.004
    13.1573 
    11.1665  
    11.8082    
    12.2129   
    12.8075   
    12.4167  
    11.5708   
    12.5798    
    13.7489];
T={'cluster2'; 'cluster1'; 'cluster3'; 'cluster9'; 'cluster0'; 'cluster7'; 'cluster4'}; % Sample Of Cell Array
[UnqClust,~,idx] = unique(T);                                                           % Unique Values & Locations
Out = Y(idx);                                                                           % Mapped Output
for k1 = 1:length(idx)
    fprintf(1, '\t%s\t%.4f\n', T{k1}, Out(k1))              % Demonstration Output (Delete Loop)
end
  cluster2  11.1665
  cluster1  13.1573
  cluster3  11.8082
  cluster9  12.4167
  cluster0  12.0040
  cluster7  12.8075
  cluster4  12.2129

MATLAB: I want solution for this problem

This code might do what you are looking for

ix = sscanf( T(:,end), '%1d' );
Y  = cat( 1, Y, rand );     % add one element to get 10
R  = Y(ix+1);

inspect the result

>> R(1:6)'
ans =
   11.1665   11.1665   11.1665   11.5740   11.5740   11.5740

where

T=['cluster2'
'cluster2'
'cluster2'
'cluster7'
'cluster7'
'etc ....'];

Best Answer

Related Solutions

MATLAB: How i can solve this problem? Pleas!

MATLAB: I want solution for this problem

Related Question