We know that Jaccard (computed between any two columns of binary data $\bf{X}$) is $\frac{a}{a+b+c}$, while Rogers-Tanimoto is $\frac{a+d}{a+d+2(b+c)}$, where
- a - number of rows where both columns are 1
- b - number of rows where this and not the other column is 1
- c - number of rows where the other and not this column is 1
- d - number of rows where both columns are 0
$a+b+c+d=n$, the number of rows in $\bf{X}$
Then we have:
$\bf X'X=A$ is the square symmetric matrix of $a$ between all columns.
$\bf (not X)'(not X)=D$ is the square symmetric matrix of $d$ between all columns ("not X" is converting 1->0 and 0->1 in X).
So, $\frac{\bf A}{n-\bf D}$ is the square symmetric matrix of Jaccard between all columns.
$\frac{\bf A+D}{\bf A+D+2(n-(A+D))}=\frac{\bf A+D}{2n-\bf A-D}$ is the square symmetric matrix of Rogers-Tanimoto between all columns.
I checked numerically if these formulas give correct result. They do.
Upd. You can also obtain matrices $\bf B$ and $\bf C$:
$\bf B= [1]'X-A$, where "[1]" denotes matrix of ones, sized as $\bf X$. $\bf B$ is the square asymmetric matrix of $b$ between all columns; its element ij is the number of rows in $\bf X$ with 0 in column i and 1 in column j.
Consequently, $\bf C=B'$.
Matrix $\bf D$ can be also computed this way, of course: $n \bf -A-B-C$.
Knowing matrices $\bf A, B, C, D$, you are able to calculate a matrix of any pairwise (dis)similarity coefficient invented for binary data.
Using a,b,c,d convention of the 4-fold table, as here,
Y
1 0
-------
1 | a | b |
X -------
0 | c | d |
-------
a = number of cases on which both X and Y are 1
b = number of cases where X is 1 and Y is 0
c = number of cases where X is 0 and Y is 1
d = number of cases where X and Y are 0
a+b+c+d = n, the number of cases.
substitute and get
$1-\frac{2(b+c)}{n} = \frac{n-2b-2c}{n} = \frac{(a+d)-(b+c)}{a+b+c+d}$ = Hamann similarity coefficient. Meet it e.g. here. To cite:
Hamann similarity measure. This measure gives the probability that a
characteristic has the same state in both items (present in both or
absent from both) minus the probability that a characteristic has
different states in the two items (present in one and absent from the
other). HAMANN has a range of −1 to +1 and is monotonically related to
Simple Matching similarity (SM), Sokal & Sneath similarity 1 (SS1), and Rogers & Tanimoto similarity (RT).
You might want to compare the Hamann formula with that of phi correlation (that you mention) given in a,b,c,d terms. Both are "correlation" measures - ranging from -1 to 1. But look, Phi's numerator $ad-bc$ will approach 1 only when both a and d are large (or likewise -1, if both b and c are large): product, you know... In other words, Pearson correlation, and especially its dichotomous-data hypostasis, Phi, is sensitive to the symmetry of marginal distributions in the data. Hamann's numerator $(a+d)-(b+c)$, having sums in place of products, is not sensitive to it: either of two summands in a pair being large is enough for the coefficient to attain close to 1 (or -1). Thus, if you want a "correlation" (or quasi-correlation) measure defying marginal distributions shape - choose Hamann over Phi.
Illustration:
Crosstabulations:
Y
X 7 1
1 7
Phi = .75; Hamann = .75
Y
X 4 1
1 10
Phi = .71; Hamann = .75
Best Answer
There exist many such coefficients (most are expressed here). Just try to meditate on what are the consequences of the differences in formulas, especially when you compute a matrix of coefficients.
Imagine, for example, that objects 1 and 2 similar, as objects 3 and 4 are. But 1 and 2 have many of the attributes on the list while 3 and 4 have only few attributes. In this case, Russell-Rao (proportion of co-attributes to the total number of attributes under consideration) will be high for pair 1-2 and low for pair 3-4. But Jaccard (proportion of co-attributes to the combined number of attributes both objects have = probability that if either object has an attribute then they both have it) will be high for both pairs 1-2 and 3-4.
This adjustment for the base level of "saturation by attributes" makes Jaccard so popular and more useful than Russell-Rao, e.g. in cluster analysis or multidimensional scaling. You might, in a sense, further refine the above adjustment by selecting Kulczynski-2 measure which is the arithmetic mean probability that if one object has an attribute, the other object has it too: $$ (\frac{a}{a+b} + \frac{a}{a+c}) /2 $$ Here the base (or field) of attributes for the two objects is not pooled, as in Jaccard, but is own for each of the two objects. Consequently, if the objects differ greatly on the number of attributes they have, and all its attributes the "poorer" object shares with the "richer" one, Kulczynski will be high whereas Jaccard will be moderate.
Or you could prefer to compute geometric mean probability that if one object has an attribute, the other object has it too, which yields Ochiai measure: $$ \sqrt {\frac{a}{a+b} \frac{a}{a+c}} $$ Because product increases weaker than sum when only one of the terms grows, Ochiai will be really high only if both of the two proportions (probabilities) are high, which implies that to be considered similar by Ochiai the objects must share the great shares of their attributes. In short, Ochiai curbs similarity if $b$ and $c$ are unequal. Ochiai is in fact the cosine similarity measure (and Russell-Rao is the dot product similarity).
P.S.
Speaking of similarity measures, one shouldn't mix nominal dichotomous attributes (e.g. female, male) with binary attributes (present vs absent). Binary attribute isn't symmetric (in general), - if you and I share a characteristic, it is the basis for calling us similar; if you and I both miss the characteristic, it may or may not be considered the evidence of similarity, depending on the context of the study. Hence the divergent treatment of $d$ is possible.
Note also that if you wish to compute similarity between objects based on 1+ nominal attributes (dichotomous or polytomous), recode each such variable into the set of dummy binary variables. Then the recommended similarity measure to compute will be Dice (which, when computed for 1+ sets of dummy variables, is equivalent to Ochiai and Kulczynski-2).