Update Thursday, April 29, 2010: See further commentary at a newer post here.
Just finished reading Jon McClellan and Mary-Claire King's Genetic Heterogeneity in Human Disease essay in Cell. It's definitely one of the most forthright and compelling essays I've read on the subject of the inadequacy of GWAS for identifying genes that cause complex human disease. The essay starts with an evolutionary perspective. Most human variation is relatively ancient - originating in ancient human populations long before the migration out of Africa. Yet new alleles arise constantly, and because of the relatively recent human population growth, we can be certain that most alleles are actually recent and rare. For a common allele to remain in the population it must withstand evolutionary pressure. If the variation is pathogenic, it must either (1) lead to disease later in life so as not to affect fitness (e.g. Alzheimer's Disease, AMD), or (2) it must be balanced by positive selection (e.g. hemoglobin genes which cause sickle cell anemia are balanced by positive selection from malaria resistance).
The authors then dive into heterogeneity, citing many examples of human diseases which display both locus heterogeneity (mutations in many different genes lead to the same disease), and allelic heterogeneity (many mutations in the same gene cause the same disease). The authors discuss early-onset breast and ovarian cancer, inherited hearing loss, genetics of lipid metabolism, and severe mental illnesses such as autism or schizophrenia.
Next comes a very nice discussion of the common-disease-common-variant (CDCV) hypothesis and GWAS. Thousands of "risk variants" have been identified from GWAS, yet most of these have no apparent biological function. Since most genotyping platforms select for common variants, and because evolution has ensured that most common variants are neutral, then it follows that most GWAS findings are neutral, stemming from factors other than a true association with disease risk.
For one, the authors cite a problem we're all well aware of: population stratification. Yet we tend to think that if we eliminate ethnic outliers or control for stratification with PCA or the like, then we've eliminated the problem. Yet the authors point to a recently Nature-published GWAS in autism that provides a striking example of the problem hypervariable alleles can cause. The authors found an association with a SNP which had a frequency in cases of 0.65, and a frequency in controls of 0.61. All cases and controls were of European descent. Yet the frequency of the risk variant varies from 0.21 to 0.77 across European populations! (N.b. - see the discussion of this point in a newer post). This difference in frequency across European populations is 14 times higher than the frequency difference between cases and controls! Even very minimal differences in ancestry between cases and controls could have explained this association rather than true association with autism.
The authors do give a few examples of where common variants truly affect a common disease (hemoglobin genes and sickle-cell anemia, autoimmune disorders and the MHC region, Alzheimer's disease and APOE, lactose intolerance and alleles in the lactase gene enhancer region). Yet these examples prove two points. (1) All the variants in these genes have a demonstrable effect on the protein or its expression, as opposed to most GWAS findings, and (2) back to the evolutionary perspective, all of these genes have reason to remain common because of their evasion of evolutionary pressure, because they either do not affect reproductive fitness, or are balanced by positive selection.
The authors conclude by offering potential paths going forward, utilizing high-throughput sequencing technologies. One of the problems with sequencing data is not just finding potentially deleterious mutations, but determining which of the many potentially deleterious mutations actually play a role in human disease. One of the most promising strategies is to use next-gen sequencing to trace coinheritance of potential disease causing alleles with disease within affected families - essentially linkage analysis. Finally the authors assert that replication in genetics studies should focus on the identification and confirmation of multiple biologically relevant mutations in the same gene. This would provide both biological and epidemiological support for the causality of the gene or pathway in the pathogenesis of the disease.
This essay is definitely worth a read.
Cell: Genetic Heterogeneity in Human Disease
Update Tuesday, April 27, 2010: Keep an eye out over at Genetic Future for an upcoming post pointing out some of the problems with this paper I didn't consider here.