Pages

Showing posts with label Search. Show all posts
Showing posts with label Search. Show all posts

RegulomeDB: Identify DNA Features and Regulatory Elements in Non-Coding Regions

Many papers have noted the challenges associated with assigning function to non-coding genetic variation, and since the majority of GWAS hits for common traits are non-coding, resources for providing some mechanism for these associations are desperately needed.

Boyle and colleagues have constructed a database called RegulomeDB to provide functional assignments to variants using data from manual curation, CHiP-seq data, chromatin state information, eQTLs across multiple cell lines, and some computational predictions generated from DNase footprinting and transcription factor binding motifs.

RegulomeDB implements a tiered category system (1-6) where category 1 has an eQTL association in addition to other ENCODE sources of data, 2 -5 have some ENCODE data only with no eQTL associations, and category 6 has evidence of a binding motif change only. As you might imagine, the annotation density increases as you increase category numbers.



Their simple, but impressive interface will accept RS numbers, or whole BED, GFF, or VCF files for annotation. The resulting output (example above) is downloadable, providing both specifics of the annotation (such as the transcription factor binding to the area) and the functional score for the variant.

http://regulome.stanford.edu/

Get all your Questions Answered

When I have a question I usually ask the internet before bugging my neighbor. Yet it seems like Google's search results have become increasingly irrelevant over the last few years, and this is especially true for searching anything related to R (and previously mentioned Rseek.org doesn't really do the job I would expect it to do either).

The last few years has seen the development of several community-powered Q&A websites, and I'm not talking about Yahoo Answers. Here are a few that come to mind that I've used and found extremely helpful.

Biostar (biostars.org) - a Q&A site for bioinformatics. The site's focus is bioinformatics, computational genomics and biological data analysis. A few of my favorite threads from this site are one on mapping SNPs to pathways, and another on mapping SNPs to genes using tools like the UCSC public MySQL server.

CrossValidated (http://stats.stackexchange.com/) - a Q&A site for for statisticians, data miners, and anyone else doing data analysis. This one's relatively new but already has many very talented and extremely helpful users. Last week I asked a question about R², about the difference between variance explained and variation explained, and how that related to Random Forests. The question was answered merely a few hours later.

Finally, there's Quora (http://www.quora.com/). Quora's a little different from the others, and you can ask just about anything you want here. Quora's also still young, but seems to have lots of science/tech geeks like us using it. I recently asked a question, requesting a lay explanation of how Random Forest works, and got a great answer. There was also a good thread about whether current customers found 23andMe to be worth buying.

There's an FAQ on all of these sites that explains how to ask a good question. You might even try answering a few questions yourself and find it rewarding. It's a lot like playing a game, with rather odd goals. You get reputation points and "badges" for answering questions, having your answers voted on, commenting on others' answers, etc. You'll also find that as your own reputation increases by providing good answers to others' questions, your own questions will be answered more quickly. If none of these are quite what you're looking for, check out the stackexchange directory. You'll find Q&A sites that all use the same engine dedicated to topics from photography or cooking to programming and web development.

*Edit 2011-02-22* Thanks to two commenters for pointing this out. There's also a good Q&A community for next generation sequencing, including a forum (http://seqanswers.com/) and a StackExchange site (http://i.seqanswers.com/)

Recent improvements to Pubget

If you've never heard of it before, check out my previous coverage on Pubget. It's like PubMed, but you get the PDFs right away.  Pubget has recently implemented a number of improvements.

1. Citation matching.  Pubget's citation matcher seems to work better than Pubmed most of the time.  Try going to Pubget and pasting any of these random citations into the search bar:

J Biol Chem 277: 30738-30745
Nucleic Acids Res 2004;32:4812-20.
Evol. Biol. 7, 214 (2007).


2. The PaperPlane bookmarklet.  Go here and drag the link to your bookmark toolbar.  Now, if you're searching from pubmed, click the bookmarklet for one-click access to the PDF.

3. If you have a long list of PMIDs, separate them with commas and you can paste them directly into the search bar.

Pubget (Vanderbilt institutional link)

Pubget (If you're anywhere else)

Pubget = Pubmed on Steroids

I've used this a little bit recently. Pubget indexes essentially everything that PubMed does, except you get the PDF you're looking for right away. Lots of other useful tools as well. I sent one email to the Pubget team and CC'd the biomedical library, and a few days later they've worked it out so PubGet recognizes Vanderbilt's subscriptions. If you're at Vanderbilt, go to http://vanderbilt.pubget.com/, otherwise just use http://pubget.com/, and select your institution from the dropdown list, or email them if it's not there.

The one thing I've found is that they don't index things as quickly as PubMed, so you might have a hard time finding Advance Online Publications using Pubget.

Wolfram Alpha as a bioinformatics tool

Just released last week by the makers of Mathematica, Wolfram Alpha is kind of like a search engine, calling itself a "computational knowledge engine," with the lofty goal as a "long-term project to make all systematic knowledge immediately computable by anyone."

From their homepage you can link to a page showing examples of how to use it, but I was interested in seeing how much biology Wolfram Alpha knows, and I've got to say I'm impressed with the results.

(Note: their servers are pretty busy I guess, so if the links don't work the first time, or the search times out, try reloading.)

Check out the results I got when I searched for APOE. It correctly interpreted the fact that I wanted information about the human gene, and accordingly gave me information about the gene and its location, along with a chromosome ideogram, a reference sequence, splice structures, and more.

I was also impressed to see what happened when I entered a random string of ACGT's. It correctly interpreted my query as a nucleotide sequence, told me the amino acid sequence it would make, correctly guessed how often this sequence would be found in the genome if bases occur randomly, and gave me gene names, positions, and ideograms of the places where this sequence is actually found in the human genome.

Finally, I tried searching for a SNP that I have an interest in.

For being only days old, and for not being specifically developed as a bioinformatics tool, it's pretty impressive what it can do already. It should be interesting to see what else they come up with.

Gene Prospector

This abstract in BMC Bioinformatics was presented in our Computational Genetics Journal Club a few weeks back: "Gene Prospector: An evidence gateway for evaluating potential susceptibility genes and interacting risk factors for human diseases."

As described by the authors at CDC, Gene Prospector is a bioinformatics tool designed to sort, rank, and display information about genes in relation to human diseases, risk factors and other phenotypes.

While there seem to be several tools like this out there, this one was truly pleasant to use. A quick search for "Alzheimer" turned up some very familiar results. Genes are ranked by the evidence strength that were calculated based on the volume of different types of published literature in human genome epidemiology. The results of a query provide tons of helpful links, including links to information about each candidate gene found, exportable lists of journal articles where the results of association studies, GWASs, and meta-analyses were published, information about SNPs in these genes, and more.

Overall this seems like a great place to start when you want to compare your results to others, or when you just want more information about particular genotype-phenotype associations.

Gene Prospector

Pubmed Searches as an RSS feed

As Stephen nicely posted earlier, RSS feeds are a very powerful way to keep up with the literature -- they "push" the information to you. In addition to subscribing to individual journals, you can subscribe to a PubMed search! This will let you keep up with ALL PubMed indexed journals.

To subscribe to a PubMed search, first go to www.pubmed.org and enter your search terms. Once you retrieve a search listing, you'll see a bar that says

Display Summary Show 20 Sort By Send to

The SEND TO drop down box will allow you to select an RSS Feed. Once you select this, you'll be taken to a page with a button that says "Create Feed". When you click this, you'll get a new page with a little orange XML button. Click it and your browser will give you the option to subscribe to the feed. Once you subscribe, there are lots of ways to read RSS Feeds, which we'll probably get to in another post.

Enjoy!