SUPPLEMENTARY DATA for
"Genome-wide detection and analysis of hippocampus core promoters using Deep CAGE"

CAGE tag sequences

Tag sequences have been deposited in DNA Data Bank of Japan (DDBJ ). Accession numbers for individual tags are: AGAAA0000001-AGAAA0552486

CAGE tag tracks for use with the UCSC browser

All of these data are presented in the paper "Genome-wide detection and analysis of hippocampus core promoters using Deep CAGE" by E Valen et al; a joint collaboration between the Omics Center (Riken Yokohama Institute, Japan), The Bioinformatics Centre (Copenhagen University, Denmark), SISSA (Italy) and university of Grifffith (Australia) .All track coordinates refer to the mm8 assembly. All tracks below can also be downloaded in this directory.

How to use Wig and Bed files

The CAGE tracks can be directly used in the UCSC browser at http://genome.ucsc.edu/, either by

first downloading the .bed and .wig files of interest and then, at the main UCSC site clicking Genomes->add custom track, and upload the files in this page
Or, by using the pre-established links shown below . Clicking one of the links will add that track as custom track to the ucsc browser as above, but without having to download the file. It will also open the browser. If desiring multiple tracks, just close the genome browser window and repeat for other tracks of interets: all the tracks will then be whown in the genome browser.

A tutorial on how to use the genome browser can be found at the UCSC browser help page

Clicking on the links below will upload the track on the mm8 assembly in the UCSC browser.

WIG tracks (single nucleotide resolution barplots).

CAGE tags from:

Cerebellum: plus strand
Cerebellum: minus strand
Embryo: plus strand
Embryo: minus strand
Hippocampus: plus strand
Hippocampus: minus strand
Liver: plus strand
Liver: minus strand
Lung: plus strand
Lung: minus strand
Macrophages: plus strand
Macrophages: minus strand
Som. cortex: plus strand
Som. cortex: minus strand
Vis. cortex: plus strand
Vis. cortex: minus strand

BED tracks (blocks, corresponding to clusters of tags)

Summary tracks:

All CAGE tag clusters (regardless of number of tags)
All CAGE tag clusters (>30 Tags per million (TPM))

Preferentially expressed promoters (PEPs): Subsets of the tag cluster aboe that have more that have >30 TPMs and where >50% of tags come from a particular tissue (normalized for sample size)

FASTA file for all PEPs and corresponding otif over-representation files

This tar.zipped data directory (can be opened by any extractor like WinZip) contains the following sub directories:

data
result
seq/fullPromoter
seq/tagSequence

Here we describe in details the content of directories and files from least post-processed to most post-processed.

We start with the Preferentially expressed promoters (PEPs) (this is also referred to as tag clusters below) for the different tissues (also availabe as bed files above). The data directory contains the raw tag-cluster positions of these in a tab delimited format. Column one contains a unique ID, column two is F for forward strand R for reverse strand, column three chromosome name, column four and five contains start and end of the tag-cluster, respectively.

The seq/tagSequence contains fasta sequences corresponding to the tag-clusters described in the raw files. All fasta headers correspond to the raw unique ID. The seq/fullPromoter contains expanded sequences relative to the positions given in the raw data files. Each region is expanded -1000 to +200 relative to start and end of the tag-cluster sequence. Fasta headers are ID, strand, and new start and end position after the expansion.Note that the fasta files are pure text files, even though they do not have the suffix .txt - any editor can be forced to open them.

Finally, the result directory contains all p-values for all JASPAR-matrices across all full promoter fasta files. The p-value is calculated by a one-tailed binomial test for the greater tail. P-values are sorted in ascending order, given by the first column in the file. The second column contains the JASPAR matrix ID.

Images of genes where hippocampus alternative promoter usage is predicted to give differential protein domain content.

The following tar zipped file contains plots of the 50 genes where hippocampus has a PEP downstream of known protein domain(s). The CTSS track (black) shows the tags at each strand, the PEP track (pink) shows the location of the hippocampus PEPs and the protein domains are shown in dark green. There is also a track showing all mRNAs from the UCSC browser.

SUPPLEMENTARY DATA for "Genome-wide detection and analysis of hippocampus core promoters using Deep CAGE"

CAGE tag sequences

CAGE tag tracks for use with the UCSC browser

FASTA file for all PEPs and corresponding otif over-representation files

Images of genes where hippocampus alternative promoter usage is predicted to give differential protein domain content.

SUPPLEMENTARY DATA for
"Genome-wide detection and analysis of hippocampus core promoters using Deep CAGE"