IsoformSwitchAnalyzeR

Enabling Identification and Analysis of Isoform Switches with Functional Consequences from RNA-sequencing data

Kristoffer Vitting-Seerup

2017-06-09

Abstract

Recent breakthrough in bioinformatics now allows us to accurately reconstruct and quantify full-length gene isoforms from RNA-sequencing data. This development opened the possibility to start analyzing alternative isoform usage but unfortunately RNA-sequencing data is still underutilized since such analyses are both hard to obtain and only rarely done.

To solve this problem we developed IsoformSwitchAnalyzeR. IsoformSwitchAnalyzeR is an easy to use R package that facilitates statistical identification of isoform switching from RNA-seq derived quantification of both novel and annotated full-length isoforms. IsoformSwitchAnalyzeR furthermore facilitate integration of many sources of annotation including features such as Open Reading Frame (ORF), protein domains (via Pfam), signal peptides (via SignalP), coding potential (via CPAT) as well as sensitivity to Non-sense Mediated Decay (NMD). The combination of identified isoform switches and their annotation also enables IsoformSwitchAnalyzeR to predict potential functional consequence of the identified isoform switches - such as loss of protein domains or coding potential - thereby identifying isoform switches of particular interest. Lastly IsoformSwitchAnalyzeR provide article ready visualization of isoform switches as well as multiple layers summary statistics describing the global amount and consequences of isoform switching.

In summary IsoformSwitchAnalyzeR enables analysis of RNA-seq data with isoform resolution with a focus on isoform switching (with predicted consequences) thereby expanding the usability of RNA-seq data.

Table of Content

Abstract

Preliminaries

Background and Package Description
Installation
What To Cite
How To Get Help

Quick Start

Workflow Overview
Short Example Workflow (aka the “To long - didn’t read” section)

Detailed Workflow

Overview
- IsoformSwitchAnalyzeR Background Information
Importing Data Into R
Prefiltering
Identifying Isoform Switches
Analyzing Open Reading Frames
Extracting Nucleotide and Amino Acid Sequences
Advise for Running External Sequence Analysis Tools
Importing External Sequences Analysis
Predicting Intron Retentions
Predicting Switch Consequences
Post Analysis of Isoform Switches with Consequences

Other workflows

Augmenting ORF Predictions with Pfam Results
Analyze Small Upstream ORFs
Remove Sequences Stored in SwitchAnalyzeRlist
Adding Uncertain Category to Coding Potential Predictions
Quality control of ORF of known annotation
Analyzing the Biological Mechanisms Behind Isoform Switching

Frequently Asked Questions and Problems Sessioninfo

Preliminaries

Background and Package Description

The combination of alternative Transcription Start sites (aTSS), Alternative Splicing (AS) and alternative Transcription Termination Sites (aTTS) is often referred to as alternative transcription and is considered a the major factors in modifying the pre-RNA and contributing to the complexity of higher organisms. Alternative transcription is widely used as recently demonstrated by The ENCODE Consortium, which found an average of 6.3 different transcripts were generated per gene, although the individual number of transcripts from a single gene have been reported anywhere from one to thousands.

The importance of analyzing isoforms instead of genes has been highlighted by many examples showing functionally important changes that cannot be detected at gene level. One of these examples is the pyruvate kinase. In normal adult homeostasis, cells use the adult isoform (M1), which supports oxidative phosphorylation. But almost all cancers use the embryonic isoform (M2), which promotes aerobic glycolysis, one of the hallmarks of cancer. Such a shift in isoform usage cannot be detected at the gene level and have therefore been termed isoform switching.

In 2010 a breakthrough in bioinformatics with the emergence of tools such as Cufflinks, which allows researchers to reconstruct and quantify, full length transcripts from RNA-seq data. Such data has the potential to facilitate both genome wide analysis of alternative isoform usage and identification of isoform switching - but unfortunatly these types of analysis are still only rarely done and/or reported.

We hypothesis that there are multiple reasons why RNA-seq data is not used to its full potential:

There is still a lack of tools that can identify isoform switches with isoform resolution - thereby identifying the exact isoforms involved in a switch.
Although there are many very good tools to perform sequence analysis there is no common framework, which allows for integration of the analysis provided by these tools.
There is a lack of tools facilitating easy and article ready visual visualization of isoform switches.

To solve this problem we developed IsoformSwitchAnalyzeR.

IsoformSwitchAnalyzeR is an easy to use R package that enables the user to import the (novel) full-length derived isoforms from an RNA-seq experiment into R. If annotated transcripts are analyzed IsoformSwitchAnalyzeR offers integration with the multi-layer information stored in a GTF file including the annotated coding sequences (CDS). If transcript structure were predicted (de-novo or guided) IsoformSwitchAnalyzeR offers a highly accurate tool for identifying the dominant ORF of the isoforms. The knowledge of the isoform positions of the CDS/ORF furthermore allows for prediction of sensitivity to Nonsense Mediated Decay (NMD) - the mRNA quality control machinery that degrades isoforms with pre-mature termination codons (PTC).

Next IsoformSwitchAnalyzeR enables identification of isoform switches via a newly developed statistical method that test each individual isoform for differential usage and thereby identifies the exact isoforms involved in isoform switch.

Since we know the exon structure of the full-length isoform, IsoformSwitchAnalyzeR can extract the underlying nucleotide sequence from a reference genome. This enables integration with the Coding Potential Assessment Tool (CPAT) - which predicts the coding potential of an isoform and can also be used to increase accuracy of ORF predictions. By combining the CDS/ORF isoform positions with the nucleotide sequence we can also extract the (most likely) amino acid (AA) sequence of the CDS/ORF. The AA sequence enables integration of analysis of protein domains (via Pfam) and signal peptides (via SignalP) - both of which are supported by IsoformSwitchAnalyzeR. Lastly since the structures of all (expressed) isoforms from a given gene are known one can also annotate intron retentions (via spliceR).

Combined this means that IsoformSwitchAnalyzeR enables annotation of isoforms with intron retentions, ORF, NMD sensitivity, coding potential, protein domains as well as signal peptides, all enable identification of important functional consequences of the isoform switches, a functionality also implemented in IsoformSwitchAnalyzeR.

IsoformSwitchAnalyzeR contains tools that allow the user to create article ready visualization of both individual isoform switches as well as general common consequences of the switches. These tools are easy to use and integrate all the information gathered throughout the workflow. An example of an isoform switch can be found in the Short Example Workflow section.

Lastly IsoformSwitchAnalyzeR is based on standard Bioconductor classes such as GRanges and BSgenome whereby it is support for all species and versions supported in the Bioconductor annotation packages.

Back to Table of Content.

Installation

IsoformSwitchAnalyzeR is part of the Bioconductor repository and community which means it is distributed with, and dependent on, Bioconductor. Installation of IsoformSwitchAnalyzeR is very easy and can be done from within the R terminal. If it is the first time you use Bioconductor simply copy/paste the following into your R session to install the basic bioconductor packages:

source("http://bioconductor.org/biocLite.R")
biocLite()

If you already have installed Bioconductor running these two commands will check whether updates for installed packages are available.

After you have installed the basic bioconductor packages you can install IsoformSwitchAnalyzer by copy pasting the following two lines into your R session:

source("http://bioconductor.org/biocLite.R")
biocLite("IsoformSwitchAnalyzeR")

This will install the IsoformSwitchAnalyzeR package as well as other R packages that are needed for IsoformSwitchAnalyzeR to work.

What To Cite

The IsoformSwitchAnalyzeR tool is only made possible by a string of other tools and scientific discoveries - please read this section thoroughly and cite the appropriate articles. Note that due to the references being divided into sections some references appear more than once.

If you are using the Prediction of consequences please cite:

Vitting-Seerup et al. The Landscape of Isoform Switches in Human Cancers. Cancer Res. (2017)

If you are using the isoform switch test implemented in IsoformSwitchAnalyzeR please cite both:

Vitting-Seerup et al. The Landscape of Isoform Switches in Human Cancers. Cancer Res. (2017)
Ferguson et al. P-value calibration for multiple testing problems in genomics. Stat. Appl. Genet. Mol. Biol. 2014, 13:659-673.

If you are using the isoform switch test implemented in DRIMSeq package please cite both:

Vitting-Seerup et al. The Landscape of Isoform Switches in Human Cancers. Cancer Res. (2017)
Nowicka et al. DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics. F1000Research, 5(0), 1356.

If you are using the visualizations (plots) implemented in the IsoformSwitchAnalyzeR package please cite:

Vitting-Seerup et al. The Landscape of Isoform Switches in Human Cancers. Cancer Res. (2017)

If you are using the Intron Retention analysis please cite:

Vitting-Seerup et al. spliceR: an R package for classification of alternative splicing and prediction of coding potential from RNA-seq data. BMC Bioinformatics 2014, 15:81.

If you are using the prediction of open reading frames (ORF) please cite all of:

Vitting-Seerup et al. The Landscape of Isoform Switches in Human Cancers. Cancer Res. (2017)
Vitting-Seerup et al. spliceR: an R package for classification of alternative splicing and prediction of coding potential from RNA-seq data. BMC Bioinformatics 2014, 15:81.

If you are using the prediction of pre-mature termination codons (PTC) and thereby NMD-sensitivity please cite all of:

Vitting-Seerup et al. spliceR: an R package for classification of alternative splicing and prediction of coding potential from RNA-seq data. BMC Bioinformatics 2014, 15:81.
Weischenfeldt et al. Mammalian tissues defective in nonsense-mediated mRNA decay display highly aberrant splicing patterns. Genome Biol 2012, 13:R35
Huber et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods, 2015, 12:115-121.

If you are external sequence analysis tools please cite the appropriate of:

Wang et al. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 2013, 41:e74.
Finn et al. The Pfam protein families database. Nucleic Acids Research (2014) Database Issue 42:D222-D230
Petersen et al. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature Methods, 8:785-786, 2011

How To Get Help

This R package comes with a lot of documentation. Much information can be found in the R help files (which can easily be accessed by running the following command in R “?functionName”, for example “?isoformSwitchTest”). Furthermore this vignette contains a lot of information so make sure to read both sources carefully as it will contain the answer to the most Frequently Asked Questions and Problems.

If you have unanswered questions or comments regarding IsoformSwitchAnalyzeR please post them on the associted google group: https://groups.google.com/forum/#!forum/isoformswitchanalyzer (after making sure the question have not already been answered there).

If you want to report a bug (found in the newest version of the R package) please make an isssue with a reproducible example at github https://github.com/kvittingseerup/IsoformSwitchAnalyzeR - remember to add the appropriate label.

If you have suggestions for improvements also put them on github (https://github.com/kvittingseerup/IsoformSwitchAnalyzeR) this will allow other people to upvote you idea by reactions thereby showing us there is wide support of implementing you idea.

Back to Table of Content.

Quick Start

Workflow Overview

The idea behind IsoformSwitchAnalyzeR is to make it very easy to do advanced post analysis of full length, RNA-seq derived transcripts, with a focus on finding, annotating and visualizing isoform switches with functional consequences. IsoformSwitchAnalyzeR therefore performs 3 specific tasks:

Identifying isoform switches
Annotate the transcripts involved in the isoform switches
Visualize the consequences of the isoform switches, both individually and combined.

A normal workflow for identification and analysis of isoform switches with functional consequences can be divide into two parts (also illustrated by in Figure 1).

1) Extract Isoform Switches and Their Sequences. This involves importing the data into R, identifying isoform swithces, annotating those switches with open reading frames (ORF) and extract both the nucleotide and peptide sequence. The later enables the usage of external sequence analysis tools such as

CPAT : The Coding-Potential Assessment Tool, which can be run either locally or via their webserver.
Pfam : Prediction of protein domains, which can be run either locally or via their webserver.
SignalP : Prediction of Signal Peptides, which can be run either locally or via their webserver.

All this be done using the high level function:

isoformSwitchAnalysisPart1()

See Short Example Workflow for example of usage, and Detailed Workflow for details on the individual steps.

2) Plot All Isoform Switches and Their annotation. This involves importing and incooperating the external sequence annotation, identifying intron retentions, predicting functional consequences and lastly plotting all genes with isoform switches as well as summarizing general consequences of switching.

All of this can all be done using the function:

isoformSwitchAnalysisPart2()

See Short Example Workflow for example of usage, and Detailed Workflow for details on the individual steps.

Alternatively if one does not plan to incorporate external sequence analysis it is possible to run the full workflow by using:

isoformSwitchAnalysisCombined()

Which correspond to running isoformSwitchAnalysisPart1() and isoformSwitchAnalysisPart2() without adding the external results.

Figure 1: Workflow overview. The grey transparent boxes indicate the two parts of a normal workflow for analyzing isoform switches. The individual steps in the two sub-workflows are indicated by arrows. The speech bubble summarizes how this full analysis can be done in a simple two step process using the high level functions (isoformSwitchAnalysisPart1() and isoformSwitchAnalysisPart2()) in IsoformSwitchAnalyzeR.

Back to Table of Content.

Short Example Workflow

A full, but less customizable, analysis of isoform switches can be done using by using the two high level functions isoformSwitchAnalysisPart1() and isoformSwitchAnalysisPart2() as described above. This section aims to show how these high-level functions are used as well as serve as a showcase for what IsoformSwitchAnalyzeR can be used for.

Lets start by loading the R package

library(IsoformSwitchAnalyzeR)

Note that newer versions of RStudio supports auto-completion of package names.

Importing the Data

The first step is to obtain/import all the data needed for the analysis and store them in switchAnalyzeRlist object. IsoformSwitchAnalyzeR contains different functions for importing the relevant data from a couple of known sources as well as support other data sources. Specificly the wrappes for supported data sources are called import<description>() and createSwitchAnalyzeRlist() allows users to use data from other sources. See Importing Data Into R) for details about these functions. For the purpose of illustrating the data import lets use Cufflinks/Cuffdiff results as an example:

cuffDB <- prepareCuffExample()

This function is just a wrapper for readCufflinks() which makes the sql database from the example Cufflinks/Cuffdiff result data included in the R package “cummeRbund”. If you have not installed cummeRbund it can be done by running the following command:

source("https://bioconductor.org/biocLite.R")
biocLite("cummeRbund")

Once you have a CuffSet (the object type generated by readCufflinks() and prepareCuffExample()), a switchAnalyzeRlist is then created simply by:

aSwitchList <- importCufflinksCummeRbund(cuffDB)

Unfortunately this example dataset is not ideal for illustrating the usability of IsoformSwitchAnalyzeR as it only has two replicates (and the switch test relies on replicates). To illustrate the workflow lets instead use some of the test data from the IsoformSwitchAnalyzeR package:

data("exampleSwitchList")
exampleSwitchList
#> This switchAnalyzeRlist list contains:
#>  259 isoforms from 84 genes
#>  1 comparison from 2 conditions
#> 
#> Switching features:
#>            Comparison switchingIsoforms switchingGenes
#> 1 hESC vs Fibroblasts                 0              0
#> 
#> Feature analyzed:
#> [1] "Isoform Swich Identification"

This data corresponds to a small subset of the result of using to get the result of the data described in ?exampleSwitchList. Note that although a switch identification analysis was performed no genes were significant. This small data subset is ideal for examples due to the short runtimes.

Note that there are a lot of other ways of importing data into R and/or creating the switchAnalyzeRList including wrappers for Cufflinks/Cuffdiff, Kallisto, Salmon and RSEM as well as others (see Importing Data Into R for more details).

Part 1

We can now run the first part of the isoform switch analysis workflow which filter for non-expressed genes/isoforms, identifying isoform switches, annotating those switches with open reading frames (ORF) and extract both the nucleotide and peptide (amino acid) sequence.

### isoformSwitchAnalysisPart1 needs the genomic sequence to predict ORFs. 
# These are readily available from Biocindoctor as BSgenome objects: 
# http://bioconductor.org/packages/release/BiocViews.html#___BSgenome
# Here we use Hg19 - which can be download by copy/pasting the following two lines into the R terminal:
# source("https://bioconductor.org/biocLite.R")
# biocLite("BSgenome.Hsapiens.UCSC.hg19")

library(BSgenome.Hsapiens.UCSC.hg19)

exampleSwitchList <- isoformSwitchAnalysisPart1(
    input=exampleSwitchList, 
    genomeObject = Hsapiens, 
    dIFcutoff = 0.4,         # Set high for short runtime in example data
    outputSequences = FALSE, # keeps the function from outputting the fasta files from this example
    calibratePvalues = FALSE
)

Note that: 1. it is possible to supply the CuffSet (the object which links to the cufflinks/cuffdiff sql database, created with readCufflinks{cummeRbund}) directly to the input argument instead of creating the switchAnalyzeRlist first. 2. The isoformSwitchAnalysisPart1() function have a argument, overwritePvalues, which overwrite the result of user supplied p-values (such as those imported by cufflinks) with the result of running isoformSwitchTest(). 3. The switchAnalyzeRlist produced by isoformSwitchAnalysisPart1() have been subsettet to only contain genes where an isoform switch (as defined by the alpha and dIFcutoff arguments) where identified. This enables much faster runtimes for the rest of the pipeline as only isoform from a gene with a switch are analyzed.

The number of switching features is easily summarized via:

extractSwitchSummary(exampleSwitchList, dIFcutoff = 0.4) # supply the same cutoff to the summary function
#>            Comparison nrIsoforms nrGenes
#> 1 hESC vs Fibroblasts         12       7

Part 2

The second part of the isoform switch analysis workflow, which includes importing and incorporating the external sequence annotation, predicting functional consequences and visualizing both the general effects of isoform switches as well as the individual isoform switches. The combined analysis can be done by:

exampleSwitchList <- isoformSwitchAnalysisPart2(
    switchAnalyzeRlist      = exampleSwitchList, 
    dIFcutoff               = 0.4,   # Set high for short runtime in example data
    n                       = 10,    # if plotting was enabled it would only output the top 10 switches
    removeNoncodinORFs      = TRUE,  # Because ORF was predicted de novo
    pathToCPATresultFile    = system.file("extdata/cpat_results.txt"   , package = "IsoformSwitchAnalyzeR"),
    pathToPFAMresultFile    = system.file("extdata/pfam_results.txt"   , package = "IsoformSwitchAnalyzeR"),
    pathToSignalPresultFile = system.file("extdata/signalP_results.txt", package = "IsoformSwitchAnalyzeR"),
    codingCutoff            = 0.725, # the coding potential cutoff we suggested for human 
    outputPlots             = FALSE  # keeps the function from outputting the plots from this example
)

The numbers of isoform switches with functional consequences are easily extracted:

extractSwitchSummary(exampleSwitchList, filterForConsequences = TRUE, dIFcutoff = 0.4) # supply the same cutoff to the summary function
#>            Comparison nrIsoforms nrGenes
#> 1 hESC vs Fibroblasts         10       5

For each of these gene a switchPlot have been created - lets here tak a closer look at the top candidate:

The top genes with isoform switches are:

extractTopSwitches(exampleSwitchList, filterForConsequences = TRUE, n=4)$gene_name
#> [1] "KIF1B"    "ARHGEF19" "LDLRAD2"  NA

Examples of visualization

Lets take a look at the isoform switch in the MSTP9 gene via the switchPlot that can easily be made with IsoformSwitchAnalyzeR:

switchPlot(exampleSwitchList, gene='MSTP9')

From this plot we see that the gene expression is changed but also that there is a large significant switch in isoform usage. In the hESC the short isoform is used but as the cells are differentiated to Fibroblasts the long isoform is upregulated. Interestingly this isoform switch results in the inclusion of several domains (a PAN_1 domain and 5 Kringle domains). Furthermore it is noteworthy that the length of the trypsin domains are very different which could indicate the trypsin domain in the long isoform is trunkated - but that is a hypothesis that would need to be furthere exprored by other means.

Note that if you want to save this plot as a pdf file via the pdf command you need to specify onefile = FALSE. Here is a code chunk that will generate a nicely sized (almost) article ready figure

pdf(file = '<outoutDirAndFileName>.pdf', onefile = FALSE, height=5, width = 8)
switchPlot(exampleSwitchList, gene='MSTP9')
dev.off()

Furthermore note that if you are analyzing multiple conditions you will also need to specify the ‘condition1’ and ‘condition2’ arguments in the switchPlot() function.

To get an overview of the global consequences of isoform switching in the different comparisons lets take a look at a lager dataset:

data("exampleSwitchListAnalyzed")
exampleSwitchListAnalyzed
#> This switchAnalyzeRlist list contains:
#>  490 isoforms from 120 genes
#>  3 comparison from 3 conditions
#> 
#> Switching features:
#>            Comparison switchingIsoforms switchingGenes
#> 1 hESC vs Fibroblasts               148             79
#> 2  iPS vs Fibroblasts               135             69
#> 3         iPS vs hESC               135             76
#> 4            combined               264            120
#> 
#> Feature analyzed:
#> [1] "Isoform Swich Identification, ORFs, ntSequence, aaSequence, Signal Peptides, Protein Domains, Intron Retentions, Switch Consequences, Coding Potential"

extractConsequenceSummary(exampleSwitchListAnalyzed, asFractionTotal = FALSE)

From this summary plot we can see many things: First of all the most fequent changes are changes in ORF length. Secondly from iPS to both Fibroblast and hESC we see a larger humber of domain gains than domain losses. Most notably is the uneven distribution of intron retention gain/loss in the comparison of iPS to both Fibroblast and hESC.

Since all the data is saved in the switchAnalyzeRlist we can also make a couple of overview plots:

data("exampleSwitchListAnalyzed")

### Vulcano like plot:
ggplot(data=exampleSwitchListAnalyzed$isoformFeatures, aes(x=dIF, y=-log10(isoform_switch_q_value))) +
     geom_point(
        aes( color=abs(dIF) > 0.1 & isoform_switch_q_value < 0.05 ), # default cutoff
        size=1
    ) +
    geom_hline(yintercept = -log10(0.05), linetype='dashed') + # default cutoff
    geom_vline(xintercept = c(-0.1, 0.1), linetype='dashed') + # default cutoff
    facet_grid(condition_1 ~ condition_2, scales = 'free') +
    scale_color_manual('Signficant\nIsoform Switch', values = c('black','red')) +
    theme_bw()
#> Warning: Removed 116 rows containing missing values (geom_point).

From which it is clearly seen why cutoffs both on the dIF and the q-value are nesseary.

Another interesting overview plot can be made as follows:

### Switch vs Gene changes:

ggplot(data=exampleSwitchListAnalyzed$isoformFeatures, aes(x=gene_log2_fold_change, y=dIF)) +
    geom_point(
        aes( color=abs(dIF) > 0.1 & isoform_switch_q_value < 0.05 ), # default cutoff
        size=1
    ) + 
    facet_grid(condition_1 ~ condition_2, scales = 'free') +
    geom_hline(yintercept = 0, linetype='dashed') +
    geom_vline(xintercept = 0, linetype='dashed') +
    scale_color_manual('Signficant\nIsoform Switch', values = c('black','red')) +
    labs(x='Gene log2 fold change', y='dIF') +
    theme_bw()
#> Warning: Removed 35 rows containing missing values (geom_point).

From which it is clearly seen that isoform switching is not somthing that occure in the absense of changes in gene expression again highlighting the nessesity of analyzing RNA-seq data with isoform resolution.

Back to Table of Content.

Detailed Workflow

Overview

We recommend that before you start on this section you read though the Quick Start section as it gives the large overview, introduces some basic concepts and give a couple of tips which for clarity are not duplicated in this section.

Compare to the workflow presented in Quick Start a full workflow for analyzing isoform switches naturally have a lot of sub-steps which each can be customized/optimized. In this section we will go more into depth with each of the steps as well as tips and shortcuts for working with IsoformSwitchAnalyzeR. More specifically each of the main function(s) behind these steps will be explained, both how they work and the main parameters for customization. For a comprehensive and detailed description of each individual function please refer to the individual function documentation (via ?functionName).

The first section is a section about IsoformSwitchAnalyzeR Background Information, then a detailed isoform switch analysis workflow is described and lastly an overview of other useful Other Tools in IsoformSwitchAnalyzeR is provided.

IsoformSwitchAnalyzeR

Enabling Identification and Analysis of Isoform Switches with Functional Consequences from RNA-sequencing data

Kristoffer Vitting-Seerup

2017-06-09

Abstract

Table of Content

Preliminaries

Background and Package Description

Installation

What To Cite

How To Get Help

Quick Start

Workflow Overview

Short Example Workflow

Importing the Data

Part 1

Part 2

Examples of visualization

Detailed Workflow

Overview

IsoformSwitchAnalyzeR Background Information

The switchAnalyzeRlist

Function overview

Importing Data Into R

Data from Cufflinks/Cuffdiff

Data from Kallisto, Salmon or RSEM

Data From Other Full-length Transcript Assemblers

Prefiltering

Identifying Isoform Switches

Testing Isoform Switches with IsoformSwitchAnalyzeR

The Idea Behind The Statistical Test

Usage of The Statistical Test

Testing Isoform Switches via DRIMSeq

Testing Isoform Switches with other Tools

Analyzing Open Reading Frames

Extracting Nucleotide and Amino Acid Sequences

Advise for Running External Sequence Analysis Tools

Importing External Sequences Analysis

Predicting Intron Retentions

Predicting Switch Consequences

Post Analysis of Isoform Switches with Consequences

Analysis of Individual Isoform Switching

Genome Wide Analysis

Other Tools in IsoformSwitchAnalyzeR

Other workflows

Augmenting ORF Predictions with Pfam Results

Analyze Small Upstream ORFs

Remove Sequences Stored in SwitchAnalyzeRlist

Adding Uncertain Category to Coding Potential Predictions

Quality control of ORF of known annotation

Analyzing the Biological Mechanisms Behind Isoform Switching

Frequently Asked Questions and Problems

Final Remarks

Sessioninfo