Solved – Problem in a cluster analysis of User behavior

clusteringmodel-based-clustering

Data set that extracted from Log file, the simplest way to represent it is by use User- Page view matrix which represent each user how many time visit corresponding page as attached in sample image,

i have a dot regard if many web page can be highly correlated and has similar subject how do we separate similar user groups based on their behavior. what kind of cluster analysis can be used to represent corpus of users towards their browsing to clarify similar user groups instead of using Apriori or similar item set.

i'm thinking to compare Keywords that attached with each page and if for example two or more pages has similar keywords then pages can be addressed has similar subject and ignore rather pages, otherwise page its consider ?
can some one give notes or suggestion for optimal way please ?

User-Page view matrix

Best Answer

You can try using TF-IDF (on this aggregate) and hierarchical clustering to find clusters.

To cluster users, have each row correspond to a user. To cluster pages, transpose your data.