As far as I know you can't really do this in R at the moment, if you actually need a mixed model (eg, if you care about the variance components)
The weights argument to lme4::lmer()
won't do what you want, because lmer()
interprets the weights as precision weights not as sampling weights. In contrast to ordinary linear and generalised linear models you don't even get correct point estimates with code that treats the sampling weights as precision weights for a mixed model.
If you don't need to estimate variance components and you just want the multilevel features of the model to get correct standard errors you can use survey::svyglm()
.
1) The tech support reply that you link to and which reads that hierarchical clustering is less appropriate for binary data than two-step clustering is, is incorrect for me.
It is true that when there is a substantial amount of distances between objects which are not of unique value ("tied" or "duplicate" distances) - which is quite expectable a situation with any few-valued discrete data, not only binary data, - the results of clustering will strongly depend on the order of processing of the objects. But this scandal accompanies any clustering method, any method directly or indirectly basing itself on some distance/similarity measure. If there are some ties in a quantity which determine clusters - that can show up, as unstable solutions. The unstability caused by ties is thus natural and cannot be an argument against this or that method potentially suffering from it.
In the particular case of the linked note, you can make certain that two-step cluster method will also - like hierarchical method - give from time to time different results under different sort order of the observations in the provided dataset. So, I don't see any advantage of one method over the other in that respect.
2) Hierarchical cluster is well suited for binary data because it allows to select from a great many distance functions invented for binary data and theoretically more sound for them than simply Euclidean distance. However, some methods of agglomeration will call for (squared) Euclidean distance only. Here's a few of points to remember about hierarchical clustering.
One important issue about binary/dichotomous data when selecting a similarity function is whether your data is ordinal binary (asymmetric categories: present vs absent) or nominal binary (symmetric categories: this vs that) for you. In other words, should 0-0 match be a ground of similarity or not? (You may want to read answers like this, this.)
3) Two-step cluster method of SPSS could be used with binary/dichotomous data as an alternative to hierarchical (and to some other) methods (some related answers this, this). However, two-step's processing of categorical variables employs log-likelihood distance which is right for nominal, not "ordinal binary" categories. So, if you treat your data as the latter, you have problems. Treating the variables as quantitative (interval) won't solve it. In some specific cases it is possible to convert a number of binary features into one or more multinomial nominal features quite effectively; in general, it would be quite a tricky task to do it without losing information. An experienced analyst may experiment with optimal scaling techniques and multiple correspondence analysis to see if multiple binary features can be well replaced by a smaller number of equivalent quantitative ones.
Best Answer
Some cluster algorithms can use case weights. At least, "average" (also called UPGMA) or "Ward" clustering methods can use weights. If available, you should use those weights to get non biased results. In R, you can specify weights using the member argument of the "hclust" function (in base R). The WeightedCluster library also provides some functions (such as partionning around medoids PAM and clustering quality measure) for clustering weighted data.
You can mix different types of variable (i.e. nominal, metric, ...) using the "gower" distance. In R, this distance is available in the "cluster" library using the "daisy" function.
You can have more information about this commands by running: