A while back Will showed you how to create QQ plots of p-values in Stata and in R using the now-deprecated sma package. A bit later on I showed you how to do the same thing in R using ggplot2. As much as we (and our readers) love ggplot2 around here, it can be quite a bit slower than using the built in base graphics. This was only recently a problem for me when I tried creating a quantile-quantile plot of over 12-million p-values. I wrote the code to do this in base graphics, which is substantially faster than using the ggplot2 code I posted a while back. The code an an example are below.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# http://GettingGeneticsDone.blogspot.com/ | |
# See http://gettinggeneticsdone.blogspot.com/p/copyright.html | |
# Define the function | |
ggd.qqplot = function(pvector, main=NULL, ...) { | |
o = -log10(sort(pvector,decreasing=F)) | |
e = -log10( 1:length(o)/length(o) ) | |
plot(e,o,pch=19,cex=1, main=main, ..., | |
xlab=expression(Expected~~-log[10](italic(p))), | |
ylab=expression(Observed~~-log[10](italic(p))), | |
xlim=c(0,max(e)), ylim=c(0,max(o))) | |
lines(e,e,col="red") | |
} | |
#Generate some fake data that deviates from the null | |
set.seed(42) | |
pvalues=runif(10000) | |
pvalues[sample(10000,10)]=pvalues[sample(10000,10)]/5000 | |
# pvalues is a numeric vector | |
pvalues[1:10] | |
# Using the ggd.qqplot() function | |
ggd.qqplot(pvalues) | |
# Add a title | |
ggd.qqplot(pvalues, "QQ-plot of p-values using ggd.qqplot") |
Here's what the resulting QQ-plot will look like: