I do not use Weka, but I will try to explain how things works, and I hope you will find the way to do that in Weka.
Transform the unsupervised into a supervised problem
So RF knows only supervised learning. However, in order to do unsupervised learning you have to set up your problem as a supervised learning one. In order to transform you problem into a supervised one you create a new data made from original data set and a synthetic one. Also you need a new feature, a target feature which is a binary nominal variable. You set with one label the original observations, with the other labels the synthetic observations.
Now, the train you will use is made from the reunion of both original and synthetic data set. To create a synthetic data set there are a multiple algorithms. The most popular one is to create a synthetic data set with the same number of instances as the original, and with the same features. The values from the features are drawn randomly for each feature separately, in an independent way.
Suppose you have two features: $x_1$ a continuous variable which comes from a $Normal(0, 1)$ and $x_2$ a nominal binary feature, having sample probabilities for labels like $p[male]=0.4$ and $p[female]=0.6$. Your new synthetic instances will draw randomly for $x_1$ a value from all the values of the original data, and for $x_2$ a value from Bernoulli with according probabilities.
Another approach is to estimate the real distributions of the variables and draw sample values from these distributions.
Pay attention that the random drawings are independent.
There are 2 reasons why this is done in RF:
1. Using randomization one can study how a variable is important for predictions.
2. The variables are de-correlated.
Now your training data set is created by simply joining the instances of those 2 data sets.
Use additional features of RF to create distance matrix
Among other things, RF has a feature which is called proximities. A proximity matrix is a matrix with $N$ columns and $N$ rows, where $N$ is the number of instances. This matrix collects information about how instances ends up being in the same terminal node. Basically, while learning, RF builds trees. For each tree, and for each terminal node, add 1 to the proximity matrix for each instance $i$ and instance $j$ from the terminal node.
Now you are not interested into distances which involves the synthetic data set. So you have to reduce the original proximity matrix, by eliminating all the rows and all the columns for synthetic instances.
In the end divide all elements from the proximity matrix with the number of trees, set up with $1$ all elements from diagonal $proximity[i][i]=1$, and you have a proximity matrix.
Now for clustering you need a distance. The used way to transform from proximity to a distance is to transform each element of the proximity matrix with $proximity[i][j] = \sqrt{(1-proximity[i][j])}$. Now you have a distance matrix for instances of the original data set.
Clustering original data using the RF distances
This step is straight-forward. There are many clustering algorithms and usually this algorithms needs a distance function. This distance function can be also given usually in the form of a distance matrix since you are interested only in clustering the instances from a limited data set.
Additional note
I do not know what happens for large data sets, since the distance matrix can be prohibitive. There are some implementations which are able to reduce the memory consumption, but I have no idea how these works.
Best Answer
First I recommend reading about Naive Bayes.
NaiveBayes decides according to values of two probability predictions not according to if they are near to one. For example class A probability is 0.1 and class B probability 0.12 then class B is the prediction. Here score means posterior probability of class given prior probabilities of features. In the above stack overflow example: Posterior probabilities (scores) was 1/20 and 1/60. Higher one is chosen.
You may look to class source file. But not everything in weka is well documented. This output is fairly simple therefore I do not think you will find a documentation.
Use following to get csv prediction output.
We use Evaluation class instead of using Naive Bayes directly. First argument to Evaluation class is classifier to use, here NaiveBayes. -classification switch allows to output CSV.
Another example with well known iris data set.
This command gives following output.