Pages

LocusZoom: Plot regional association results from GWAS

Update Friday, May 14, 2010: See this newer post on LocusZoom.



If you caught Cristen Willer's seminar here a few weeks ago you saw several beautiful figures in the style of a manhattan plot, but zoomed in around a region of interest, with several other useful information overlays.

Click to take a look at this plot below, showing the APOE region for an Alzheimer's Disease GWAS:

It's a simple plot of the -log10(p-values) for SNPs in a given region, but it also shows:

1. LD information (based on HapMap) shown by color-coded points (not much LD here).
2. Recombination rates (the blue line running through the plot). Peaks are hotspots.
3. Spatial orientation of the SNPs you plotted (running across the top)
3. Genes! The overlay along the bottom shows UCSC genes in the region.

You can very easily take a PLINK output file (or any other format) and make an image like this for your data for any SNP, gene, or region of interest using a tool Cristen and others at Michigan developed called LocusZoom.  LocusZoom is written in R with a Python wrapper that works from an easy to use web interface.

All the program needs is a list of SNP names and their associated P-values. If you're using PLINK, your *.assoc or *.qassoc files have this information, but first you'll have to run a quick command to format them. Run this command I discussed in a previous post to convert your PLINK output into a comma delimited CSV file (PLINK's default is irregular whitespace delimited):

cat plink.assoc | sed -r 's/^\s+//g' | sed -r 's/\s+/,/g' > plink.assoc.csv

Next, you'll want to compress this file so that it doesn't take forever to upload.

gzip plink.assoc.csv

Now, upload your new file (plink.assoc.csv.gz) on the LocusZoom website.  Tell it that your p-value column is named "P" and your marker column is named "SNP" (or whatever they're called if you're not using PLINK). Change the delimiter type to "comma", then put in a region of interest. I chose APOE, but you could also use a SNP name (include the "rs" before the number). Now hit "Plot your Data," and it should take about a minute.

There are some other options below, but I've had bad luck using any of them. For instance, I can never get it to output a PNG properly - only PDF works the last time I tried it. I also could not successfully make a plot if I turn off the recombination rate overlay. I know this is a very early version, but hopefully they'll clean up some of the code and document some of its features very soon. I could see this being a very useful tool, especially once it's available for download for local use. (Update: some of these bugs have been fixed. See this newer post on LocusZoom).

LocusZoom: Plot regional association results from GWAS