First, you'll need a Linux system, and two utilities: tabix and vcftools.
I'm virtualizing an Ubuntu Linux system in Virtualbox on my Windows 7 machine. I had a little trouble compiling vcftools on my Ubuntu system out of the box. Before trying to compile tabix and vcftools I'd recommend installing the GNU C++ compiler and another development version of a compression library, zlib1g-dev. This is easy in Ubuntu. Just enter these commands at the terminal:
sudo apt-get install g++
sudo apt-get install zlib1g-dev
First, download tabix. I'm giving you the direct link to the most current version as of this writing, but you might go to the respective sourceforge pages to get the most recent version yourself. Use tar to unpack the download, go into the unzipped directory, and type "make" to compile the executable.
tar -jxvf tabix-0.2.3.tar.bz2
cd tabix-0.2.3/
Now do the same thing for vcf tools:
tar -zxvf vcftools_v0.1.4a.tar.gz
tar -zxvf vcftools_v0.1.4a.tar.gz
cd vcftools_0.1.4a/
The vcftools binary will be in the cpp directory. Copy both the tabix and vcftools executables to wherever you want to run your analysis.
Let's say that you wanted to pull all the 1000 genomes data from the CETP gene on chromosome 16, compute allele frequencies, and drop a linkage format PED file so you can look at linkage disequilibrium using Haploview.
First, use tabix to hit the 1000 genomes FTP site, pulling data from the 20080804 release for the CETP region (chr16:56,995,835-57,017,756), and save that output to a file called genotypes.vcf. Because tabix doesn't download the entire 1000 Genomes data and pulls only the sections you need, this is extremely fast. This should take around a minute, depending on your web connection and CPU speeds.
./tabix -fh 16:56995835-57017756 > genotypes.vcf
Not too difficult, right? Now use vcftools (which works a lot like plink) to compute allele frequencies. This should take less than one second.
./vcftools --vcf genotypes.vcf --freq --out allelefrequencies
Finally, use vcftools to create a linkage format PED and MAP file that you can use in PLINK or Haploview. This took about 30 seconds for me.
./vcftools --vcf genotypes.vcf --plink --out plinkformat
That's it. It looks like you can also dig around in the supporting directory on the FTP site and pull out genotypes for specific ethnic groups as well (EUR, AFR, and ASN HapMap populations).