Pages

Create annotated GWAS manhattan plots using ggplot2 in R

*** Update April 25, 2011: This code has gone through a major revision. Please see the updated code and tutorial here. ***

 A few months ago I showed you in this post how to use some code I wrote to produce manhattan plots in R using ggplot2. The qqman() function I described in the previous post actually calls another function, manhattan(), which has a few options you can set. I recently had to update this function to allow me to color code SNPs of interest, similar to the plots shown in figure 1 of Cristen Willer's 2008 Nature Genetics paper on lipids. I'll try to explain how to utilize that feature here.

The only extra thing you'll need here is a list of SNPs that you want to highlight. The only thing - that list of SNPs can't have the "rs" prefix on the rs numbers. They must be integers. E.g. if you want to highlight rs1234 and rs5678, you would create an array containing the integers 1234 and 5678. If you already have a list of SNPs, use the substr() command to perform a substring operation to extract only the digits from the rs numbers.

Once you load in your PLINK results and your array containing the rs numbers you want to highlight, simply call the manhattan() function with the option annotate=T, and SNPlist=x, where x is the name of the vector containing rs numbers.

Here's some example code:

# This requires ggplot2
require(ggplot2)

# First, load these functions from source:
source("http://dl.dropbox.com/u/66281/0_Permanent/qqman.r")

# Next, load your PLINK results file to a data frame:
mydata=read.table("plink.qassoc", header=TRUE)

# Assuming you already have a vector of rs numbers to highlight
head(ImportantSNPs)
[1] 3821815 1851665 1621816 1403694 1656922  166479

# Call the manhattan function, with annotate=T.
# The SNPlist argument takes the list of SNPs to highlight.
# Save the plot to an object
myplot=manhattan(mydata,annotate=T,SNPlist=ImportantSNPs)

# Finally, save the plot in the current directory using ggsave()
ggsave("manhattan.png",myplot,w=12,h=9,dpi=100)

If all goes well, you should have a manhattan plot with SNPs of interest highlighted. It might look something like this:

A few tips: You can use the UCSC genome browser to look up coordinates for genes, then select rs numbers based on that range, if you want to highlight certain genes. The default color is green but you can change this on line 118 of the code at the URL above.

*** Update April 25, 2011: This code has gone through a major revision. Please see the updated code and tutorial here. ***