Bioinformatik 1 exercise

Pairwise sequence comparison with dot plots

Dotter is a  graphical dotplot program written in C  for detailed comparison of two sequences. It generates a dotplot where the first sequence to be compared  is set along the x-axis, and the second sequence along the y-axis. A dot is plotted where there are matches between the two sequences.

Dotter is very useful for:
identifying  sequence repeats inside DNA and proteins.
identifying functional domains.
identifying exons-introns in  eucaryotic genomic DNA.

 To run dotter type at the promt or the command line :
dotter ./[file1.dna] ./[file2.dna]
and press enter. (Normally "./" is not necessary, but there is a bug in the present version of the program.)

When you have a dot plot, right-click to get a menu.

The sequence files must be in FastA (pronounced Fast A) format  and may be both a protein sequence, both a DNA sequence, or the second sequence is a protein file. Below there are links to some FastA files that you can download (shift + left mouse button).
FastA files can be aquired from e.g. the "National Center for Biotechnology Information" NCBI by folowing the directions "here".

Exercise 1: Pairwise alignment of amino acid sequences: elucidation of  protein functional domains inside protein kinases. If you  want to know more about protein kinases, protein phosphorylation and cellular regulation, read  a excellent review  given by Edmon H. Fischer in his nobel lecture, part I and part II

Compare HER2/ErbB2 against itself and HER3/ErB3. Notice the patterns in the dot plot and try to find functional domains, for example cysteine rich regions. Try to vary the display with the graymap tool.
(You can find a simlar pattern when comparing ErbB2 against HER1/ErbB1 and HER4/ErbB4).

Now compare HER2/ErbB2 with ErbB2-dog. Do you see the same pattern? There is an insertion in the N-term of one of the sequences. Which sequence has the insertion and what is it? Use the dotter sequence window to view the sequences.

The integral membrane proteins in the HER2 family are members of the tyrosine kinase family and are important dianogstics tools and targets for antibody-based therapy of breast cancer. 


Exercise 2: Characteristics of exon/intron boundaries and consensus splicing sequences.

Run a dot plot of the  human gamma-amino-butyric acid receptor GABAAR cDNA alpha1 subtype sequence against part of the genomic DNA sequence from human chromosome 5.
Intron-exon regions should be seen easily. You can soom in by pressing the middle mouse button and dragging (press left and right button at the same time if the middle button doen't work).

Do the exon/intron junction contain consensus splice sites?
More specifically, exons-intron borders are called donor sites, and intron-exon borders are called acceptor sites. In vertebrates, these sequences are (the slash identifies the exon-intron or intron-exon junction): C(orA)AG/GTA(orG)AGT.
Notice also the poly A tail in the cDNA sequene.

GABAAR's are inhibitory receptor-ion channels expressed in interneurons of the brain and belong to a family  of neurotransmitter or  ligand activated receptors.


Exercise 3: Inverted repeats and low complexity regions.

A. Run a dot plot of genomic DNA from human chromosome 5 against itself.
Find inverted repeats. They are characterized by lines perpedicular to the diagonal.
Try also to find streches of low complexity regions (shows up as  larger "black boxes" in the dot plot), for example long streches of CTTTCTTT......
Which funtion may they have?
B. Find the human GABAAR alpha1 homolog in the mouse genomic DNA by searching GenBank at  NCBI . For example,  use the keyword GABRA1. The gene is on chromosome 11. Extract a sequence containing the mouse gene in FastA format.
Run a dot plot of human genomic DNA against mouse genomic DNA.
Do you get useful information from this plot?
For a review of tools that can be used to access data from genomes, see "A user's  guide to the human genome".

Exercise 4 (optional): Can you characterise the region around the DNA repeat found in the dot plot of genomic DNA from human chromosome against itself around position 21600, 21600.  E.g., is it Alu-like, a psudogene or transposon-like? (We do not know the answer!)
Examples of DNA repeats can be found "here"

NOTES

Without extra specifications, dotter runs but does not create an output file. To specify that you want the output data stored in a new file run the command:
> dotter -b[result.file] ./[file1.dna] ./file2.dna]

To view the result type the following at the command line:
> dotter -l[result.file] ./[file1.dna] ./[file2.dna]