Pages

QQ plot of p-values in R using base graphics

Update Tuesday, September 14, 2010: Fixed the ylim issue, now it sets the y axis limit based on the smallest observed p-value.

A while back Will showed you how to create QQ plots of p-values in Stata and in R using the now-deprecated sma package. A bit later on I showed you how to do the same thing in R using ggplot2. As much as we (and our readers) love ggplot2 around here, it can be quite a bit slower than using the built in base graphics. This was only recently a problem for me when I tried creating a quantile-quantile plot of over 12-million p-values. I wrote the code to do this in base graphics, which is substantially faster than using the ggplot2 code I posted a while back. The code an an example are below.

# http://GettingGeneticsDone.blogspot.com/
# See http://gettinggeneticsdone.blogspot.com/p/copyright.html
# Define the function
ggd.qqplot = function(pvector, main=NULL, ...) {
o = -log10(sort(pvector,decreasing=F))
e = -log10( 1:length(o)/length(o) )
plot(e,o,pch=19,cex=1, main=main, ...,
xlab=expression(Expected~~-log[10](italic(p))),
ylab=expression(Observed~~-log[10](italic(p))),
xlim=c(0,max(e)), ylim=c(0,max(o)))
lines(e,e,col="red")
}
#Generate some fake data that deviates from the null
set.seed(42)
pvalues=runif(10000)
pvalues[sample(10000,10)]=pvalues[sample(10000,10)]/5000
# pvalues is a numeric vector
pvalues[1:10]
# Using the ggd.qqplot() function
ggd.qqplot(pvalues)
# Add a title
ggd.qqplot(pvalues, "QQ-plot of p-values using ggd.qqplot")
view raw gistfile1.r hosted with ❤ by GitHub


Here's what the resulting QQ-plot will look like: