How to use Wig and Bed files
The CAGE tracks can be directly used in the UCSC browser at http://genome.ucsc.edu/, either by
A tutorial on how to use the genome browser can be found at the UCSC browser help page
Clicking on the links below will upload the track on the mm8 assembly in the UCSC browser.
WIG tracks (single nucleotide resolution barplots).
CAGE tags from:
BED tracks (blocks, corresponding to clusters of tags)
Summary tracks:
Preferentially expressed promoters (PEPs): Subsets of the tag cluster aboe that have more that have >30 TPMs and where >50% of tags come from a particular tissue (normalized for sample size)
This tar.zipped data directory (can be opened by any extractor like WinZip) contains the following sub directories:
Here we describe in details the content of directories and files from least post-processed to most post-processed.
We start with the Preferentially expressed promoters (PEPs) (this is also referred to as tag clusters below) for the different tissues (also availabe as bed files above). The data directory contains the raw tag-cluster positions of these in a tab delimited format. Column one contains a unique ID, column two is F for forward strand R for reverse strand, column three chromosome name, column four and five contains start and end of the tag-cluster, respectively.
The seq/tagSequence contains fasta sequences corresponding to the tag-clusters described in the raw files. All fasta headers correspond to the raw unique ID. The seq/fullPromoter contains expanded sequences relative to the positions given in the raw data files. Each region is expanded -1000 to +200 relative to start and end of the tag-cluster sequence. Fasta headers are ID, strand, and new start and end position after the expansion.Note that the fasta files are pure text files, even though they do not have the suffix .txt - any editor can be forced to open them.
Finally, the result directory contains all p-values for all JASPAR-matrices across all full promoter fasta files. The p-value is calculated by a one-tailed binomial test for the greater tail. P-values are sorted in ascending order, given by the first column in the file. The second column contains the JASPAR matrix ID.
The following tar zipped file contains plots of the 50 genes where hippocampus has a PEP downstream of known protein domain(s). The CTSS track (black) shows the tags at each strand, the PEP track (pink) shows the location of the hippocampus PEPs and the protein domains are shown in dark green. There is also a track showing all mRNAs from the UCSC browser.