Bioinformatics 1 exercise
BLAST
In this exercise you should learn to use the BLAST program on the web. There
are several different versions of blast, and you will try a few in this exercise.
Blast is maintamed by NCBI. NCBI
maintains many web services. You can for instance run a variaty of blast searches against various
databases. There are also many ressources related to specific genomes.
The last part lets you run blast, fasta and Smith-Waterman locally.
Exercise 1 - protein vs. protein
Here is the human protein-tyrosine kinase we used in the dotter exercise:
HER3/ErB3.
First, let us see if there is a homologous protein in the fruit fly. Go
to the Blast page and chose
protein blast search against the Drosophila genome database. Paste the fasta
sequence into the window and do the search.
What do you think some of the short matches to other proteins are? Is
it the same part of the protein that have these short matches.
Select the best match (follow the link) and save the sequence in fasta
format. Try to find out what is known about the protein. For instance search
the Swiss-prot database with this sequence and see if there is a match. The
Swiss-prot database is the most well-annotated protein database.
There are sometimes X's in the query sequence in the Blast output. X means
an unknown amino acid. They are not in the original sequence you submitted.
Why are they there?
Exercise 2 - protein vs DNA
Now, we want to find out where the protein is in the genome. Go to the Blast page and chose protein
vs DNA (translated blast searches, tblastn) and select the Drosophila genome
again.
The begin and end coordinates of the matches are shown. Does it make sense?
Try to figure out the relation between the DNA and the protein sequence shown.
Exercise 3 - DNA vs DNA
(If a search takes too long, you should be able to use the RID numbers below
to see a previously done search.)
Searh with the mRNA of the human gamma-amino-butyric acid receptor
GABAAR
cDNA alpha1 subtype sequence against the fly genome with standard nucleotide
blast (RID: 1066943373-22450-1489687.BLASTQ3).
Repeat the search with tblastx (RID: 1066943711-25276-506358.BLASTQ3). Find
some information on how tblastx works.
Compare the results on the maximal scoring sequences.
What is the important lesson to learn from this comparison???
Exercise 4 - Comparing blast, fasta and Smith-Waterman
In a previous exercise
you ran a Smith-Waterman search with the "water" program of the HER2 protein
Spongilla_tyrosine_kinase.prt
against a database of E. coli proteins. Find the search result or repeat
it.
Now do the same search with blast:
blastall -p blastp -d /net/data/Ecoli.prot -i HER2-fasta.prt
which is the command line syntax for blast (running blastall with no arguments
will show all the options you can use).
Is it faster than water?
Are there any very significant matches?
Are the most significant the same as the high-scoring with water?
Now do the same with the fasta program:
fasta HER2-fasta.prt /net/data/Ecoli.prot.fasta
What do you think the fancy graphics shows?
The fasta package also comes with an implementation of Smith-Waterman called
"ssearch". Try it (same syntax as fasta).
Compare the four results. Why are there differences? Does the protein have
a coli homolog?