Pages

QQ plots of p-values in R using ggplot2

Way back will wrote on this topic.  See his previous post for Stata code for doing this.  Unfortunately the R package that was used to create QQ-plots here has been removed from CRAN, so I wrote my own using ggplot2 and some code I received from Daniel Shriner at NHGRI.

Of course you can use R's built-in qqplot() function, but I could never figure out a way to add the diagonal using base graphics.  To get the function I wrote to make qqplots, copy and paste this into your R session:

qq = function(pvector, title="Quantile-quantile plot of p-values", spartan=F) {
    # Thanks to Daniel Shriner at NHGRI for providing this code for creating expected and observed values
    o = -log10(sort(pvector,decreasing=F))
    e = -log10( 1:length(o)/length(o) )
    # you could use base graphics
    #plot(e,o,pch=19,cex=0.25, xlab=expression(Expected~~-log[10](italic(p))), ylab=expression(Observed~~-log[10](italic(p))),xlim=c(0,max(e)),ylim=c(0,max(e)))
    #lines(e,e,col="red")
    #You'll need ggplot2 installed to do the rest
    plot=qplot(e,o, xlim=c(0,max(e)), ylim=c(0,max(o))) + stat_abline(intercept=0,slope=1, col="red")
    plot=plot+opts(title=title)
    plot=plot+scale_x_continuous(name=expression(Expected~~-log[10](italic(p))))
    plot=plot+scale_y_continuous(name=expression(Observed~~-log[10](italic(p))))
    if (spartan) plot=plot+opts(panel.background=theme_rect(col="grey50"), panel.grid.minor=theme_blank())
    plot
}


Also, make sure you have ggplot2 installed:

install.packages("ggplot2")

If you already have it installed, load it:

library(ggplot2)

The function takes a vector of p-values, optionally a title, and a third option to make the plot black and white.  You can check the default arguments to the function by typing args(qq).

qq(data$Pvals, title="My Quantile-Quantile Plot")



You can also make the plot more "Spartan" by removing the title (setting it to NULL) and by making the color scheme black on white:

qq(data$Pvals, title=NULL, spartan=TRUE)


Stay tuned - I'm also putting together a way to import your PLINK output files and make better Manhattan plots using R and ggplot2 than you ever could with Haploview.

Update Tuesday, November 10, 2009: A big tip of the hat to the anonymous commenter who pointed out that this can easily be done in the base graphics package, as well as adding confidence intervals.  As always, we welcome your comments if you know of a better or easier way to do anything mentioned here!