Pages

More on the McClellan / King GWAS essay

First, if you haven't taken a look at the comments on my previous post on this paper, go take a look. Thanks to everyone for sharing your thoughts and pointing out some of my own oversight regarding this paper.

There was one issue in particular that deserves more attention than just another comment thread. McClellan and King draw special attention to a study by Kai Wang et al (2009) Common genetic variants on 5p14.1 associate with autism spectrum disorders. Nature 459:528. A big thanks to Kai Wang for pointing out this particularly egregious misrepresentation by McClellan and the emphasis I added in my own all-too-cursory review. McClellan discuss rs4307059, reported by Wang et al. to be associated with autism, as a “particularly dramatic example of the perils of cryptic population stratification”, reasoning that the substructure is a result of large frequency differences across Europe and its fixation in Africa, when in fact the frequency of this SNP is fairly consistent across large cohorts of European ancestry: European Americans (MAF=39%), WTCCC (MAF=38%), POPRES British (MAF=39%), POPRES Spanish (MAF=37%). The extreme estimates (.21-.77) come from extremely small sample sizes (n=7 in Tuscany, MAF=75%, and n=15 in the Orcadian sample, MAF=25%). These sample sizes are way to small to estimate allele frequencies with any stability. In fact, you can see the allele frequency distribution across 51 populations here, which shows that it's quite similar across most of Europe:


Further, using the full Fst data set (which can be downloaded directly at this link), if you sort all Illumina SNPs by their variation of allele frequencies (more precisely, Fst), the SNP rs4307059 lies right in middle, so it is fairly normal for any SNP with similar MAF to display variation of allele frequencies in subpopulations in Europe or in HapMap.

There are a few other issues pointed out in the comment thread that deserve attention. McClellan asserted, and I emphasized, that most GWAS hits do not replicate. While it's definitely true that nonreplication was a huge issue in genetic association studies in the past and in the early days of GWAS, most GWAS hits that are genome-wide significant (e.g. p<1e-8) DO replicate, and studies done with family designs, which can't be explained away by population stratification, add further evidence that many of these associations are genuine. And simply because a SNP lies outside a region with known biological function doesn't mean we should wave it off so easily. There's a nice discussion of this over at Gene Expression.