Pages

Hadley Wickham's ggplot2 / Data Visualization Course Materials

Hadley Wickham, creator of ggplot2, an immensely popular framework for Tufte-friendly data visualization using R, is teaching two short courses at Vanderbilt this week. Once we opened registration to Vanderbilt students and staff we instantly filled all the available seats, so unfortunately I wasn't able to announce the course here. But the good news is that Hadley's made all the data, code, and slides from the course available online here. We weren't able to record the course, but David Smith over at Revolutions posted links to videos of Hadley teaching a similar course a few months ago.

Hadley Wickham - Visualizing Data workshop at Vanderbilt

Webcast this Morning: House Committee on Energy and Commerce hearing on DTC Genetic Testing

A live webcast of the House Committee on Energy and Commerce hearing on “Direct-to-Consumer Genetic Testing and the Consequences to the Public Health" is available at this link. I had trouble viewing the webcast in firefox - had to save the link and open it with VLC media player to get it working. You can also follow the #HouseDTC hastag on Twitter.

In case you missed the FDA public meeting on oversight of Laboratory Developed Tests (LDTs), Dan Vorhaus over at Genomics Law Report posted recaps of day 1 and day 2 of the meeting.

Update July 22 1:48pm CDT: this webcast is over. You can read a written testimony from all the witnesses along with statements from committee members here. You can also follow ongoing discussion on Twitter, and of course check Genomics Law Report in the next day or two for further analysis by Dan Vorhaus.

Update July 23: The testimony from Gregory Kutz at the Government Accountability Office (referred to as the "GAO Report," available online here) in addition to the discussion at yesterday's House hearing on DTC testing caused quite a stir. 23andMe quickly responded with a thorough point-by-point rebuttal of major points made in the GAO report. Dan Vorhaus at Genomics Law Report posted a very thorough and thoughtful summary on yesterday's events and discussion. Daniel MacArthur has also posted a summary and response on Genomes Unzipped, and the comment string on this post is definitely worth reading through.

How to Read a Genome-Wide Association Study (@GenomesUnzipped)

Jeff Barret (@jcbarret on Twitter) over at Genomes Unzipped (@GenomesUnzipped) has posted a nice guide for the uninitiated on how to read a GWAS paper. Barret outlines five critical areas that readers should pay attention to: sample size, quality control, confounding (including population substructure), the replication requirement, and biological significance. It would be nice to see a follow-up post like this on things to look out for in studies that investigate other forms of human genetic variation such as copy number polymorphism, rare variation, or gene-environment interaction.

And this is also a convenient point for me to mention Genomes Unzipped - a collaborative blog covering topics relevant to the personal genomics industry, featuring posts by several of my favorite bloggers including Daniel MacArthur (of Genetic Future), Luke Jostins (of Genetic Inference), Dan Vorhaus (of Genomics Law Report), Jan Aerts (Saaien Tist), Jeff Barret, Caroline Wright, Katherine Morley, and Vincent Plagnol. GNZ, as it's called, has only been live for about two weeks, but looks like a good one to follow as the personal genomics industry begins to mature over the next few years.


Genomes Unzipped: How to Read a Genome-Wide Association Study

23andMe GCPM recaps and FDA meeting on Laboratory Developed Tests

You can find two nice recaps of last week's personalized medicine policy forum on Genomics Law Report and 23andMe's blog, The Spittoon. Also of interest today and tomorrow - the FDA is holding a public meeting to discuss issues surrounding the potential oversight of laboratory developed tests (a catagory which DTC genetic testing may fall into). You can find the agenda and links to the live (free) webcast here, or you could follow the #FDALDT hashtag on Twitter. The Washington Post published this nice piece over the weekend summarizing the issues.

Genomics and the Consumer: The Present and Future of Personalized Medicine

For those of you not following GGD on Twitter you may not have seen this - California State Senator Alex Padilla and 23andMe are hosting a policy forum entitled "Genomics and the Consumer: The Present and Future of Personalized Medicine" today in San Francisco. The agenda looks very exciting, featuring talks by Anne Wojcicki, co-founder of 23andMe, Leroy Hood from the Institute for Systems biology, Dan Vorhaus (@genomicslawyer, editor of Genomics Law Report), Senator Padilla, and several others. Hopefully 23andMe will record and make this exciting discussion available online soon. In the meantime, you can follow the #gcpm hashtag for live updates from those in attendance.

QQ plot of p-values in R using base graphics

Update Tuesday, September 14, 2010: Fixed the ylim issue, now it sets the y axis limit based on the smallest observed p-value.

A while back Will showed you how to create QQ plots of p-values in Stata and in R using the now-deprecated sma package. A bit later on I showed you how to do the same thing in R using ggplot2. As much as we (and our readers) love ggplot2 around here, it can be quite a bit slower than using the built in base graphics. This was only recently a problem for me when I tried creating a quantile-quantile plot of over 12-million p-values. I wrote the code to do this in base graphics, which is substantially faster than using the ggplot2 code I posted a while back. The code an an example are below.



Here's what the resulting QQ-plot will look like:

Illumina Sequencing Seminar Series

Next week Brent Anderson with Illumina will be hosting a seminar series showcasing presentations from Vanderbilt scientists using Illumina technology to power their next-generation sequencing studies. Here's the schedule:

Tuesday, July 13, 2010
Vanderbilt University
Light Hall Room 512

  • 1:00 Registration
  • 1:30 Intrucution (Brent Anderson, Illumina)
  • 1:45 Whole Transcriptome Analysis of Pancreatic Progenitor Cells (Mark Magnuson, Vanderbilt)
  • 2:15 Targeted Next-Gen Sequencing in Drug Induced Torsades de Pointes (Andrea Ramirez, Vanderbilt)
  • 2:45 Studying Gene Structure, Expression, & Regulation Using the HiSeq 2000 (Haley Fiske, Illumina)

All code on GGD is Free (Open Source BSD)

At the request of a commenter I just wanted to clarify that any code released here for R or anything else is free and open source unless specifically stated otherwise. The open source BSD license for any code on GGD can be found on this copyright page.

Convert PLINK output to CSV Revisited

A while back, Stephen wrote a very nice post about converting PLINK output to a CSV file. If you are like me, you have used this a thousand times -- enough to get tired of typing lots of SED commands.

I just crafted a little BASH script that accomplishes the same effect with a single easy to type command. Insert the following text into your .bashrc file. This file is generally hidden in your UNIX home directory (you can see it if you type 'ls -al').

This version converts the infile to a tab-delimited output.

function cleanplink
{
sed -r 's/\s+/\t/g' $1 | sed -r 's/^\t//g' | sed -r 's/NA/\\N/g' > $1.txt
}

And this version converts to a CSV file.


function cleanplink
{
sed -r 's/\s+/,/g' $1 | sed -r 's/^,//g' | sed -r 's/NA/\\N/g' > $1.csv
}


I also converted the "NA" to a Null value for easy loading into MySQL, however you can remove that bit if you'd like:

function cleanplink
{
sed -r 's/\s+/,/g' $1 | sed -r 's/^,//g' > $1.csv
}


You use this function as follows:

bush@queso:~$ cleanplink plinkresults.assoc

and it produces a file with the same name, but with a ".csv" or a ".txt" on the end.