Solved – Plot a subset of categories on the x-axis in ggplot

data visualizationggplot2r

everyone! I'm trying to plot pvalues of a test by categories (in this case, genetic loci). So, the x-axis contains gene names. I'm getting the plot to work, but the x-axis was very busy, so I am thinking of splitting the screen into several plots, each containing only a few loci (i.e., a few x values). When I attempted to plot a subset of my data with the script below, I got a plot that still contained far too many x-values on the x-axis. In other words, the bars that were plotted were all crowded onto the left-hand side of the plot, and many x-values had nothing. Any idea what I'm doing wrong? It's hard to read the loci names on the x-axis, so I can't tell whether it's repeating the same locus names (i.e., maybe a problem having to do with melt()) or whether it's plotting all of the Loci values from data instead of from split1_data. Alternatively, any different recommendations for how to make a plot with hundreds of bars easy to see? Thank you so much!

library(reshape)
library(ggplot2)
require(ggplot2)

setwd("/Users/markfisher/Desktop")

sink(file="/Users/markfisher/Desktop/Pvalue_HWE_output.txt")
data=read.csv("Pvalues_of_all.csv", header=TRUE)
attach(data)
print(data$Loci[1:10]) 

split1_data<-subset(data,data$Loci %in% data$Loci[1:10])

split1_datam<-melt(split1_data,id="Loci")
print("split1_data$Loci")
    print(split1_data$Loci) 
sink()

pdf('/Users/markfisher/Desktop/pvalue_sensitivity.pdf', bg = "white")
p <- ggplot(split1_datam, aes(x =Loci, y = value, color = variable, width=.15))
p + geom_bar(position="dodge") + ylab("P-value")+ geom_hline(yintercept=0.05)
dev.off()

Update: I added made the suggested change of using droplevels(), and got an error about an unexpected numeric constant. This is the output from my R console:

> library(reshape)
> library(ggplot2)
> require(ggplot2)
> setwd("/Users/markfisher/Desktop")
> sink(file="/Users/markfisher/Desktop/Pvalue_HWE_output.txt")
> data=read.csv("Pvalues_of_all.csv", header=TRUE)
> attach(data)
> print(data$Loci[1:10]) 
    > split1_data<-droplevels(subset(data,data$Loci %in% data$Loci[1:10]))
    > split2_data<-subset(data,data$Loci %in% data$Loci[11:20])
    > split3_data<-subset(data,data$Loci %in% data$Loci[21:30])
    > split4_data<-subset(data,data$Loci %in% data$Loci[31:40])
    > split5_data<-subset(data,data$Loci %in% data$Loci[41:51])
    > split6_data<-subset(data,data$Loci %in% data$Loci[52:62])
    > 
    > split1_datam<-melt(split1_data,id="Loci")
    > print("split1_data$Loci")
> print(split1_data$Loci)   
> 
> 
> sink()
> pdf('/Users/markfisher/Desktop/pvalue_sensitivity.pdf', bg = "white")
> 
> p <- ggplot(split1_datam, aes(x =Loci, y = value, color = variable, width=.15))1
Error: unexpected numeric constant in "p <- ggplot(split1_datam, aes(x =Loci, y = value, color = variable, width=.15))1"
> p + geom_bar(position="dodge") + ylab("P-value")+ geom_hline(yintercept=0.05)
Error: object 'p' not found
> dev.off()
null device 
1 
>

If anyone needs to see my output file, it looks like this:

The following object(s) are masked from 'data (position 3)':
    All, Loci, X1_only, X1_removed, X2_only, X2_removed, X3_only, X3_removed,
    X4_only, X4_removed, X5_only, X5_removed, X7_only, X7_removed, X78_only,
    X8_only, X8_removed
The following object(s) are masked from 'data (position 4)':
    All, Loci, X1_only, X1_removed, X2_only, X2_removed, X3_only, X3_removed,
    X4_only, X4_removed, X5_only, X5_removed, X7_only, X7_removed, X78_only,
    X8_only, X8_removed
The following object(s) are masked from 'data (position 5)':
    All, Loci, X1_only, X1_removed, X2_only, X2_removed, X3_only, X3_removed,
    X4_only, X4_removed, X5_only, X5_removed, X7_only, X7_removed, X78_only,
    X8_only, X8_removed
The following object(s) are masked from 'data (position 6)':
    All, Loci, X1_only, X1_removed, X2_only, X2_removed, X3_only, X3_removed,
    X4_only, X4_removed, X5_only, X5_removed, X7_only, X7_removed, X78_only,
    X8_only, X8_removed
The following object(s) are masked from 'data (position 7)':
    All, Loci, X1_only, X1_removed, X2_only, X2_removed, X3_only, X3_removed,
    X4_only, X4_removed, X5_only, X5_removed, X7_only, X7_removed, X78_only,
    X8_only, X8_removed
The following object(s) are masked from 'data (position 8)':
    All, Loci, X1_only, X1_removed, X2_only, X2_removed, X3_only, X3_removed,
    X4_only, X4_removed, X5_only, X5_removed, X7_only, X7_removed, X78_only,
    X8_only, X8_removed
The following object(s) are masked from 'data (position 9)':
    All, Loci, X1_only, X1_removed, X2_only, X2_removed, X3_only, X3_removed,
    X4_only, X4_removed, X5_only, X5_removed, X7_only, X7_removed, X78_only,
    X8_only, X8_removed
The following object(s) are masked from 'data (position 10)':
    All, Loci, X1_only, X1_removed, X2_only, X2_removed, X3_only, X3_removed,
    X4_only, X4_removed, X5_only, X5_removed, X7_only, X7_removed, X78_only,
    X8_only, X8_removed
The following object(s) are masked from 'data (position 11)':
    All, Loci, X1_only, X1_removed, X2_only, X2_removed, X3_only, X3_removed,
    X4_only, X4_removed, X5_only, X5_removed, X7_only, X7_removed, X78_only,
    X8_only, X8_removed
The following object(s) are masked from 'data (position 12)':
    All, Loci, X1_only, X1_removed, X2_only, X2_removed, X3_only, X3_removed,
    X4_only, X4_removed, X5_only, X5_removed, X7_only, X7_removed, X78_only,
    X8_only, X8_removed
The following object(s) are masked from 'data (position 13)':
    All, Loci, X1_only, X1_removed, X2_only, X2_removed, X3_only, X3_removed,
    X4_only, X4_removed, X5_only, X5_removed, X7_only, X7_removed, X78_only,
    X8_only, X8_removed
The following object(s) are masked from 'data (position 14)':
    All, Loci, X1_only, X1_removed, X2_only, X2_removed, X3_only, X3_removed,
    X4_only, X4_removed, X5_only, X5_removed, X7_only, X7_removed, X78_only,
    X8_only, X8_removed
 [1] Baez       Blue       C147       C204       C21        C278_PT    C294       C316      
 [9] C485       C487_PigTa
62 Levels: Baez Blue C147 C204 C21 C278_PT C294 C316 C485 C487_PigTa C536 Carey ... Yellow
[1] "split1_data$Loci"
 [1] Baez       Blue       C147       C204       C21        C278_PT    C294       C316      
 [9] C485       C487_PigTa
Levels: Baez Blue C147 C204 C21 C278_PT C294 C316 C485 C487_PigTa

I would also show what my plot looks like, but alas I do not yet have enough cool points on CrossValidated to do so. Thanks again for all of your help with this. I hope my updates clarify things a little bit…

Best Answer

I'm going to put on my mind reading hat and suggest that you simply add droplevels when you subset:

split1_data <- droplevels(subset(data,data$Loci %in% data$Loci[1:10]))

The likely cause of the "problem" is that Loci is a factor. Subsetting a factor may reduce the levels that are present, but it doesn't change the set of levels as an attribute of the factor. If this behavior of factors disturbs you, you can avoid it by using character vectors instead by default by setting options(stringsAsFactors = FALSE).

(But in the future, please note that it is in general impossible to diagnose problems like this without more detailed information about your data, say the output from str or dput. Please include such things in future questions.)