Skip to content

Differential Analysis with Cuffdiff

Jeanie Lim edited this page Jul 12, 2016 · 8 revisions

Cuffdiff

Cuffdiff identifies differentially expressed transcripts. Comparing expression levels of genes and transcripts in RNA-Seq experiments is a hard problem. Cuffdiff is a highly accurate tool for performing these comparisons, and can tell you not only which genes are up- or down-regulated between two or more conditions, but also which genes are differentially spliced or are undergoing other types of isoform-level regulation.

Running Cuffdiff

Basic cudddiff command:

cuffdiff [options] <merged.gtf> <sample1_rep1.bam,sample1_rep2.bam> <sample2_rep1.bam,sample2_rep2.bam>

Input:

  • merged.gtf : merged gtf for all samples output by cuffmerge
  • bam/sam files : Alignment files for each replicate of sample1 (separated by commas) followed by Alignment files for each replicate of sample2 (separated by commas)

Running on MGHPCC

Step 4| Identify differentially expressed genes and transcripts

Run Cuffdiff by using the merged transcriptome assembly along with the BAM files from TopHat2 for each replicate:

cuffdiff -o diff_out -b genome.fa -p 8 
–L C1,C2 
-u merged_asm/merged.gtf \ ./C1_R1_thout/accepted_hits.bam,./C1_R2_thout/accepted_hits.bam,./C1_R3_thout/ accepted_hits.bam \ ./C2_R1_thout/accepted_hits.bam,./C2_R2_thout/accepted_hits.bam,./C2_R3_thout/ accepted_hits.bam

-b is for enabling fragment bias correction.

Examine the differential expression analysis results

The cuffdiff output is in a directory called diff_out. We are going to spend some time parsing through this output.

cd ./diff_out
ls 
# You should see all the files as listed below:
-rwxr-x--- 1 daras G-801020  2691192 Aug 21 12:20 isoform_exp.diff  : Differential expression testing for transcripts
-rwxr-x--- 1 daras G-801020  1483520 Aug 21 12:20 gene_exp.diff     : Differential expression testing for genes
-rwxr-x--- 1 daras G-801020  1729831 Aug 21 12:20 tss_group_exp.diff: Differential expression testing for primary transcripts
-rwxr-x--- 1 daras G-801020  1369451 Aug 21 12:20 cds_exp.diff      : Differential expression testing for coding sequences
 
-rwxr-x--- 1 daras G-801020  3277177 Aug 21 12:20 isoforms.fpkm_tracking
-rwxr-x--- 1 daras G-801020  1628659 Aug 21 12:20 genes.fpkm_tracking
-rwxr-x--- 1 daras G-801020  1885773 Aug 21 12:20 tss_groups.fpkm_tracking
-rwxr-x--- 1 daras G-801020  1477492 Aug 21 12:20 cds.fpkm_tracking
 
-rwxr-x--- 1 daras G-801020  1349574 Aug 21 12:20 splicing.diff  : Differential splicing tests
-rwxr-x--- 1 daras G-801020  1158560 Aug 21 12:20 promoters.diff : Differential promoter usage
-rwxr-x--- 1 daras G-801020   919690 Aug 21 12:20 cds.diff       : Differential coding output.

Note that cuffdiff has performed a statistical test on the expression values between our two biological groups. It reports the FPKM expression levels for each group, the log2(group 1 FPKM/ group 2 FPKM), and a p-value measure of statistical confidence, among many other helpful data items.

Here is a basic command useful for parsing/sorting the gene_exp.diff or isoform_exp.diff files:

cat isoform_exp.diff | awk '{print $10 "\t" $4}' | sort -n -r | head

Finally, we are going to use cummeRbund to visualize the differential expression data.


| Previous Section | This Section | Next Section | |:------------------------------------:|:--------------------------:|:--------------------------------------------:| | Transcript Assembly with Cufflinks| Differential Analysis with Cuffdiff| Visualization with CummeRbund

Clone this wiki locally