Bioinformatik 1 exercise
Pairwise sequence comparison with dot plots
Dotter is a graphical dotplot program written in C for
detailed comparison of two sequences. It generates a dotplot where the
first sequence to be compared is set along the x-axis, and the second
sequence along the y-axis. A dot is plotted where there are matches between
the two sequences.
Dotter is very useful for:
identifying sequence repeats inside DNA and proteins.
identifying functional domains.
identifying exons-introns in eucaryotic genomic DNA.
To run dotter type at the promt or the command line :
>
dotter ./[file1.dna] ./[file2.dna]
and press enter. (Normally "./" is not necessary, but there is
a bug in the present version of the program.)
When you have a dot plot, right-click to get a menu.
The sequence files must be in FastA (pronounced Fast A) format
and may be both a protein sequence, both a DNA sequence, or the second sequence
is a protein file. Below there are links to some FastA files that you can
download (shift + left mouse button).
FastA files can be aquired from e.g. the "National Center for Biotechnology
Information" NCBI by folowing the directions "
here".
Exercise 1: Pairwise alignment of amino acid sequences: elucidation
of protein functional domains inside protein kinases. If you
want to know more about protein kinases, protein phosphorylation and
cellular regulation, read a excellent review given by Edmon
H. Fischer in his nobel lecture,
part
I and
part
II
Compare
HER2/ErbB2 against itself
and
HER3/ErB3. Notice the patterns
in the dot plot and try to find functional domains, for example cysteine
rich regions. Try to vary the display with the graymap tool.
(You can find a simlar pattern when comparing ErbB2 against
HER1/ErbB1 and
HER4/ErbB4).
Now compare
HER2/ErbB2 with
ErbB2-dog. Do you see the same
pattern? There is an insertion in the N-term of one of the sequences. Which
sequence has the insertion and what is it? Use the dotter sequence window
to view the sequences.
The
integral membrane proteins in the HER2 family are members
of the
tyrosine
kinase family and are important dianogstics tools and targets for
antibody-based
therapy of breast cancer.
Exercise 2: Characteristics of exon/intron boundaries and
consensus splicing sequences.
Run a dot plot of the human gamma-amino-butyric acid receptor
GABAAR cDNA alpha1 subtype sequence against
part of the genomic DNA sequence from
human chromosome
5.
Intron-exon regions should be seen easily. You can soom in by pressing
the middle mouse button and dragging (press left and right button at the
same time if the middle button doen't work).
Do the exon/intron junction contain consensus splice sites?
More specifically, exons-intron borders are called donor sites,
and intron-exon borders are called acceptor sites. In vertebrates, these
sequences are (the slash identifies the exon-intron or intron-exon junction):
C(orA)AG/GTA(orG)AGT.
Notice also the poly A tail in the cDNA sequene.
GABAAR's are inhibitory receptor-ion channels expressed in interneurons
of the brain and belong to a family of neurotransmitter or
ligand
activated receptors.
Exercise 3: Inverted repeats and low complexity regions.
A. Run a dot plot of genomic DNA from
human chromosome 5 against
itself.
Find inverted repeats. They are characterized by lines perpedicular
to the diagonal.
Try also to find streches of low complexity regions (shows up as
larger "black boxes" in the dot plot), for example long streches of CTTTCTTT......
Which funtion may they have?
B. Find the human GABAAR alpha1 homolog in the mouse genomic DNA
by searching GenBank at
NCBI . For example,
use the keyword GABRA1. The gene is on chromosome 11. Extract a sequence
containing the mouse gene in FastA format.
Run a dot plot of human genomic DNA against mouse genomic DNA.
Do you get useful information from this plot?
For a review of tools that can be used to access data from genomes, see
"
A user's guide
to the human genome".