Here is a start: visualize these on a grid of first and second letters:
combi <- c("Ad", "am", "ar", "as", "bc", "bd", "bp", "br", "BR", "bs",
"by", "c", "C", "cc", "cd", "ch", "ci", "CJ", "ck", "Cl", "cm", "cn",
"cq", "cs", "Cs", "cv", "d", "D", "dc", "dd", "de", "df", "dg", "dn",
"do", "ds", "dt", "e", "E", "el", "ES", "F", "FF", "fn", "gc", "gl",
"go", "H", "Hi", "hm", "I", "ic", "id", "ID", "if", "IJ", "Im", "In",
"ip", "is", "J", "lh", "ll", "lm", "lo", "Lo", "ls", "lu", "m", "MH",
"mn", "ms", "N", "nc", "nd", "nn", "ns", "on", "Op", "P", "pa", "pf",
"pi", "Pi", "pm", "pp", "ps", "pt", "q", "qf", "qq", "qr", "qt", "r",
"Re", "rf", "rk", "rl", "rm", "rt", "s", "sc", "sd", "SJ", "sn", "sp",
"ss", "t", "T", "te", "tr", "ts", "tt", "tz", "ug", "UG", "UN", "V",
"VA", "Vd", "vi", "Vo", "w", "W", "y")
df <- data.frame (first = factor (gsub ("^(.).", "\\1", combi),
levels = c (LETTERS, letters)),
second = factor (gsub ("^.", "", combi),
levels = c (LETTERS, letters)),
combi = combi))
library(ggplot2)
ggplot (data = df, aes (x = first, y = second)) +
geom_text (aes (label = combi), size = 3) +
## geom_point () +
geom_vline (x = 26.5, col = "grey") +
geom_hline (y = 26.5, col = "grey")
(was: )
ggplot (data = df, aes (x = second)) + geom_histogram ()
ggplot (data = df, aes (x = first)) + geom_histogram ()
I gather:
of the one letter names,
I can have whatever lowerUPPER
name I want.
In general, starting with an upper case letter is a safer bet than lower case.
don't start with c
or d
Best Answer
Even if this is binary, you can do a scaled Principal Component Analysis (PCA). By projecting the results on the 2D plane of the first Principal Components you get an idea of the clustering of your data.
In R: