These two programs are included in the EMBOSS package which can be executed either locally or online through the biological software package at "Institut Pasteur". The programs can be found under "Sequences Alignment and Comparisons". These programs can also be executed at "European Bioinformatics Institute EMBL-EBI .
To make possible comparisons between alignment results from the dot plot exercise and global or local pairwise alignments, the "epidermal growth factor receptor" EGFR and the tyrosine kinase-type cell surface receptor HER2 is reused.
Links to documentation of the programs: needle, water
It is a good idea to start by making a new directory to store everything.
What is the main difference between the two types of alignment in these two cases?
Now save the 3 sequences in files and try to align them with the
locally
installed programs. The whole Emboss package is installed in
/net/emboss/bin/.
You can either add that directory to your path
[set path = ( $path /net/emboss/bin/ ); rehash]
or use the whole path of the program:
needle HER2-fasta.prt ALL39899_1.prtif you have set your path, or
/net/emboss/bin/needle HER2-fasta.prt ALL39899_1.prtotherwise.
The output comes in a file (ls -t will show the latest file first).
Repeat the Smith-Waterman alignment of HER2-fasta.prt ALL39899_1.prt with different parameters.
What happens if gap penalties are changed to 30 and 2 instead of the defaults 10 and 0.5?
BLOSUM62 is default. What happens to the alignment when using other matrices, e.g. PAM10?
water -datafile EPAM10 HER2-fasta.prt ALL39899_1.prtWhat is the difference between PAM10 and BLOSUM62? You can see the matrices in /net/emboss/share/EMBOSS/data/.
/net/emboss/bin/water Spongilla_tyrosine_kinase.prt /net/data/Ecoli.prot.fastaYou can get all the scores from the file with this perl command:
perl -ne 'print "$1\n" if (/^# Score:\s*([0-9]+)/);' <output file>This you can pipe into sort for instance (" | sort -nr") to find the largest scores. Make a histogram of the scores in R, excel or whatever you prefer.
Check out the highest scoring matches. You can read about the matching proteins in the Swiss-prot database. Try to interpret the matches.
Optional: Repeat all this with the protein Yeast_DIE2.prt.
Made by Svend Erik Westh Hansen, autumn 2002
Modified by AK, autumn 2003
Few changes by AK, autum 2004 and Nov. 2005