Solved – Visualizing 2-letter combinations

data visualizationr

The answers to this question on SO returned a set of approximately 125 one- to two-letter names:
https://stackoverflow.com/questions/6979630/what-1-2-letter-object-names-conflict-with-existing-r-objects

  [1] "Ad" "am" "ar" "as" "bc" "bd" "bp" "br" "BR" "bs" "by" "c"  "C" 
 [14] "cc" "cd" "ch" "ci" "CJ" "ck" "Cl" "cm" "cn" "cq" "cs" "Cs" "cv"
 [27] "d"  "D"  "dc" "dd" "de" "df" "dg" "dn" "do" "ds" "dt" "e"  "E" 
 [40] "el" "ES" "F"  "FF" "fn" "gc" "gl" "go" "H"  "Hi" "hm" "I"  "ic"
 [53] "id" "ID" "if" "IJ" "Im" "In" "ip" "is" "J"  "lh" "ll" "lm" "lo"
 [66] "Lo" "ls" "lu" "m"  "MH" "mn" "ms" "N"  "nc" "nd" "nn" "ns" "on"
 [79] "Op" "P"  "pa" "pf" "pi" "Pi" "pm" "pp" "ps" "pt" "q"  "qf" "qq"
 [92] "qr" "qt" "r"  "Re" "rf" "rk" "rl" "rm" "rt" "s"  "sc" "sd" "SJ"
[105] "sn" "sp" "ss" "t"  "T"  "te" "tr" "ts" "tt" "tz" "ug" "UG" "UN"
[118] "V"  "VA" "Vd" "vi" "Vo" "w"  "W"  "y"

And R import code:

nms <- c("Ad","am","ar","as","bc","bd","bp","br","BR","bs","by","c","C","cc","cd","ch","ci","CJ","ck","Cl","cm","cn","cq","cs","Cs","cv","d","D","dc","dd","de","df","dg","dn","do","ds","dt","e","E","el","ES","F","FF","fn","gc","gl","go","H","Hi","hm","I","ic","id","ID","if","IJ","Im","In","ip","is","J","lh","ll","lm","lo","Lo","ls","lu","m","MH","mn","ms","N","nc","nd","nn","ns","on","Op","P","pa","pf","pi","Pi","pm","pp","ps","pt","q","qf","qq","qr","qt","r","Re","rf","rk","rl","rm","rt","s","sc","sd","SJ","sn","sp","ss","t","T","te","tr","ts","tt","tz","ug","UG","UN","V","VA","Vd","vi","Vo","w","W","y")

Since the point of the question was to come up with a memorable list of object names to avoid, and most humans are not so good at making sense out of a solid block of text, I would like to visualize this.

Unfortunately I'm not exactly certain of the best way to do this. I had thought of something like a stem-and-leaf plot, only since there are no repeated values each "leaf" was placed in the appropriate column rather than being left justified. Or a wordcloud-style adaptation where letters are sized according to its prevalence.

How might this be most clearly and efficiently be visualized?

Visualizations which do either of the following fit in the spirit of this question:

  • Primary goal: Enhance the memorizability of the set of names by revealing patterns in the data

  • Alternate goal: Highlight interesting features of the set of names (e.g. which help visualize the distribution, most common letters, etc.)

Answers in R are preferred, but all interesting ideas are welcome.

Ignoring the single-letter names is allowed, since those are easier to just give as a separate list.

Best Answer

Here is a start: visualize these on a grid of first and second letters:

combi <- c("Ad", "am", "ar", "as", "bc", "bd", "bp", "br", "BR", "bs", 
"by", "c",  "C",  "cc", "cd", "ch", "ci", "CJ", "ck", "Cl", "cm", "cn", 
"cq", "cs", "Cs", "cv", "d",  "D",  "dc", "dd", "de", "df", "dg", "dn", 
"do", "ds", "dt", "e",  "E",  "el", "ES", "F",  "FF", "fn", "gc", "gl", 
"go", "H",  "Hi", "hm", "I",  "ic", "id", "ID", "if", "IJ", "Im", "In", 
"ip", "is", "J",  "lh", "ll", "lm", "lo", "Lo", "ls", "lu", "m",  "MH", 
"mn", "ms", "N",  "nc", "nd", "nn", "ns", "on", "Op", "P",  "pa", "pf", 
"pi", "Pi", "pm", "pp", "ps", "pt", "q",  "qf", "qq", "qr", "qt", "r",  
"Re", "rf", "rk", "rl", "rm", "rt", "s",  "sc", "sd", "SJ", "sn", "sp", 
"ss", "t",  "T",  "te", "tr", "ts", "tt", "tz", "ug", "UG", "UN", "V",  
"VA", "Vd", "vi", "Vo", "w",  "W",  "y")

df <- data.frame (first = factor (gsub ("^(.).", "\\1", combi), 
                                  levels = c (LETTERS, letters)),
                  second = factor (gsub ("^.", "", combi), 
                                  levels = c (LETTERS, letters)),
                  combi = combi))

library(ggplot2)
ggplot (data = df, aes (x = first, y = second)) + 
   geom_text (aes (label = combi), size = 3) + 
   ## geom_point () +
   geom_vline (x = 26.5, col = "grey") + 
   geom_hline (y = 26.5, col = "grey")

(was: two letter) grid with letters

ggplot (data = df, aes (x = second)) + geom_histogram ()

second letter

ggplot (data = df, aes (x = first)) + geom_histogram ()

first letter

I gather:

  • of the one letter names,

    • fortunately i, j, k, and l are available (so I can index up to 4d arrays)
    • unfortunately t (time), c (concentration) are gone. So are m (mass), V (volume) and F (force). No radius r nor diameter d.
    • I can have pressure (p), amount of substance (n), and length l, though.
    • Maybe I'll have to change to greek names: ε is OK, but then shouldn't

      π <- pi
      

      ?

  • I can have whatever lowerUPPER name I want.

  • In general, starting with an upper case letter is a safer bet than lower case.

  • don't start with c or d

Related Question