Pages

Randomly Select Subsets of Individuals from a Binary Pedigree .fam File

I'm working on imputing GWAS data to the 1000 Genomes Project data using MaCH. For the model estimation phase you only need ~200 individuals. Here's a one-line unix command that will pull out 200 samples at random from a binary pedigree .fam file called myfamfile.fam:

for i in `cut -d ' ' -f 1-2  myfamfile.fam | sed s/\ /,/g`; do echo "$RANDOM $i"; done | sort |  cut -d' ' -f 2| sed s/,/\ /g | head -n 200

Redirect this output to a file, and then run PLINK using the --keep option with this new file.