snakemake-RNAseq

This pipeline performs a standard RNAseq analysis, including fastQC, STAR alignment, RSEM & salmon quantification.

Directory structure

.
├── config            # Contains sample sheet (samples.tsv) and config file (config.yaml)
├── rules             # Snakemake rules
├── scripts           # Scripts to run each step of RNAseq
├── README.md
└── Snakefile         # Snakemake workflow

Installation

Option 1: Download the package

Choose "Download ZIP"

The folder named snakemake-RNAseq-main is downloaded.

Transfer the folder to users' working directory on argos.

scp -r path/to/snakemake-RNAseq-main <USER_ID>@argos-stgw2.dfci.harvard.edu:/mnt/storage/home/<USER_ID>/

Log onto argos:
```
ssh USER_ID@argos.dfci.harvard.edu
```

Change the name of the folder to snakemake-RNAseq

mv $HOME/snakemake-RNAseq-main $HOME/snakemake-RNAseq

Option 2: git clone

Clone the pipeline using the following command

git clone https://github.com/SViswanathanLab/snakemake-RNAseq.git

Make sure there is a folder named snakemake-RNAseq in users' working directory.

Usage

Instructions for preparing sample sheet

Paired-end data is assumed.
5 types of RNAseq data formats are accommodated: .bam, .fastq.gz, .fq.gz, .fastq, .fq
The config/samples.csv file is an example sample sheet, modify it as needed so that
- 1st column: sample names
- 2nd column: fq1 file names (if using .bam as input, put the name of .fastq.gz files generated from .bam)
- 3rd column: fq2 file names (if using .bam as input, put the name of .fastq.gz files generated from .bam)
- 4th column: RNA-seq file format - choose between bam and fq (all inputs need to be in the same file format)
Each column is separated by one comma. The fq1 & fq2 file names must contain the full sample names.

For example:

293T-TFE3-1,293T-TFE3-1_R1_001.fastq.gz,293T-TFE3-1_R2_001.fastq.gz,bam
293T-TFE3-2,293T-TFE3-2_R1_001.fastq.gz,293T-TFE3-2_R2_001.fastq.gz,bam

Input files

The RNAseq files for analysis should be copied to data.

cp -r path/to/<fq_files_folder> $HOME/snakemake-RNAseq/

Users should change the name of the folder containing fq files into data.

mv $HOME/snakemake-RNAseq/<fq_files_folder> $HOME/snakemake-RNAseq/data

Run snakemake

Step 1: Initiate a screen to run snakemake pipeline in the background without requiring terminal connection all the time
```
screen -S snakemake-RNAseq
```
Step 2: Change into the directory snakemake-RNAseq
```
cd $HOME/snakemake-RNAseq
```

Step 3: Activate the environment with snakemake installed & install plugin for cluster submission

source /mnt/storage/apps/Mambaforge-23.1.0-1/etc/profile.d/conda.sh
conda activate snakemake

pip install snakemake-executor-plugin-cluster-generic

Step 4: Run snakemake pipeline

snakemake --unlock
snakemake --executor cluster-generic --jobs 50 --latency-wait 60 --cluster-generic-submit-cmd "qsub -l h_vmem=256G, -pe pvm 16 -o $HOME/snakemake-RNAseq/joblogs/ -e $HOME/snakemake-RNAseq/joblogs/"

This step might take long, depending on the sample sizes.

Step 4: Detach the screen as needed
- Press Ctrl+A and then D to detach the screen as needed. Now closing the terminal or losing connection to the cluster should not interrupt the snakemake pipeline.
- Use screen -ls to see all detached screens.
- Example:
  
  numbers like 27120 and 26844 are <SCREEN-ID> used to track the processes running in that screen
Step 5 (Optional): Reattach to a screen
```
screen -r <SCREEN-ID>
```
- If the command execution in the screen is interrupted, users need to rerun Step 4 in the screen to generate all results expected.
- Press Ctrl+A and then D to detach the screen as needed.
Step 6: Delete the screen after done
```
screen -S <SCREEN-ID> -X quit
```

Output

The results are saved in the folder snakemake-RNAseq/results.
- snakemake-RNAseq/results/fastqc_results contains the fastqc results.
- snakemake-RNAseq/results/STAR_results contains the STAR results, and each subfolder is named by the sample name.
- snakemake-RNAseq/results/salmon_results contains the salmon results, and each subfolder is named by the sample name.
- snakemake-RNAseq/results/star_wide_countMatrix.csv and snakemake-RNAseq/results/star_wide_countMatrix.Rds contain the star count matrix, with genes as rows and samples as columns, in both .csv and .Rds format.
- snakemake-RNAseq/results/salmon_wide_TPM_Matrix.csv and snakemake-RNAseq/results/salmon_wide_TPM_Matrix.Rds contain the salmon TPM matrix, with genes as rows and samples as columns, in both .csv and .Rds format.
- snakemake-RNAseq/results/rsem_geneLevel_wide_TPM_Matrix.csv and snakemake-RNAseq/results/rsem_geneLevel_wide_TPM_Matrix.Rds contain the rsem gene-level TPM matrix, with genes as rows and samples as columns, in both .csv and .Rds format.
- snakemake-RNAseq/results/rsem_isoformLevel_wide_TPM_Matrix.csv and snakemake-RNAseq/results/rsem_isoformLevel_wide_TPM_Matrix.Rds contain the rsem transcript-level TPM matrix, with genes as rows and samples as columns, in both .csv and .Rds format.
snakemake-RNAseq/rsem_ref contains the reference files generated for rsem quantification.
snakemake-RNAseq/logs contains the log files for running each step of this analysis, for debugging.
snakemake-RNAseq/joblogs contains the log files for job submission, for debugging.

config

config.yaml contains the information about versions of each tool used, reference file paths

Module versions (latest ones globally installed on argos):
- samtools: 1.9
- fastqc: 0.11.7
- star: 2.7.10a
- rsem: 1.3.1
- salmon: 1.10.1
- snakemake: 8.15.2
- snakemake-executor-plugin-cluster-generic: 1.0.9
Reference file directory: /mnt/storage/labs/sviswanathan/snakemake_RNAseq_2024/Human_genome_2024/
- Can be modified as needed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

snakemake-RNAseq

Directory structure

Installation

Option 1: Download the package

Option 2: git clone

Usage

Instructions for preparing sample sheet

Input files

Run snakemake

Output

config

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
config		config
rules		rules
scripts		scripts
.DS_Store		.DS_Store
README.md		README.md
Snakefile		Snakefile

SViswanathanLab/snakemake-RNAseq

Folders and files

Latest commit

History

Repository files navigation

snakemake-RNAseq

Directory structure

Installation

Option 1: Download the package

Option 2: git clone

Usage

Instructions for preparing sample sheet

Input files

Run snakemake

Output

config

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages