While flirting around with previously mentioned ggplot2 I came across an incredibly useful set of functions in the plyr package, made by Hadley Wickham, the same guy behind ggplot2. If you've ever used MySQL before, think of "GROUP BY", but here you can arbitrarily apply any R function to splits of the data, or write one yourself.
Imagine you have two SNPs, and 2 different variables you want info about across all three genotypes at each locus. You can generate some fake data using this command:
mydata=data.frame(X1=rnorm(30), X2=rnorm(30,5,2), SNP1=c(rep("AA",10), rep("Aa",10), rep("aa",10)), SNP2=c(rep("BB",10), rep("Bb",10), rep("bb",10)))
The data should look like this:
X1 X2 SNP1 SNP2
1 -0.73671602 2.76733658 AA BB
2 -1.83947190 5.70194282 AA BB
3 -0.46963370 4.89393264 AA BB
4 0.85105460 6.37610477 AA BB
5 -1.69043368 8.11399122 AA BB
6 -1.47995127 4.32866212 AA BB
7 0.21026063 4.32073489 AA BB
8 0.19038088 4.26631298 AA BB
9 -0.29759084 4.07349852 AA BB
10 -0.46843672 7.01048905 AA BB
11 0.93804369 3.78855823 Aa Bb
12 -1.46223800 5.08756968 Aa Bb
13 -0.01338604 -0.01494702 Aa Bb
14 0.13704332 4.53625220 Aa Bb
15 0.59214151 3.75293124 Aa Bb
16 -0.27658821 5.78701933 Aa Bb
17 0.05527139 5.99460464 Aa Bb
18 1.00077756 5.26838121 Aa Bb
19 0.62871877 6.98314827 Aa Bb
20 1.18925876 8.61457227 Aa Bb
21 -0.40389888 3.36446128 aa bb
22 -1.43420595 6.50801614 aa bb
23 1.79086285 5.39616828 aa bb
24 0.15886243 3.98010396 aa bb
25 0.57746024 4.93605364 aa bb
26 -1.11158987 6.40760662 aa bb
27 1.93484117 3.33698750 aa bb
28 0.10803302 4.56442931 aa bb
29 0.56648591 4.48502267 aa bb
30 0.03616814 5.77160936 aa bb
Now let's say you want the mean and standard deviation of X1 and X2 across each of the multilocus genotypes at SNP1 and SNP2. First, install and load the plyr package:
install.packages("plyr")
library(plyr)
Now run the following command:
ddply(mydata, c("SNP1","SNP2"), function(df) data.frame(meanX1=mean(df$X1), sdX1=sd(df$X1), meanX2=mean(df$X2), sdX2=sd(df$X2)))
And you get exactly what you asked for:
SNP1 SNP2 meanX1 sdX1 meanX2 sdX2
1 aa bb 0.2223019 1.0855063 4.875046 1.144623
2 Aa Bb 0.2789043 0.7817727 4.979809 2.286914
3 AA BB -0.5730538 0.8834038 5.185301 1.601626
That command above may appear complicated at first, but it isn't really. Cerebral mastication gives a very easy to follow, concise explanation of what that command is doing. Also, check out the slides he presented at a recent R meetup discussion:
There's more to plyr than ddply, so check it out for yourself!