-
Notifications
You must be signed in to change notification settings - Fork 5
Post Alignment Quality Control
After the reads have been aligned to the genome using a splice-aware alignment algorithm, the data are available in BAM (or SAM) format files. We can begin to ask more RNA-seq specific questions about quality at this point in the analysis pipeline. Again, a number of tools are available for RNA-seq quality control (many with very similar names). In this little section of the tutorial, you will be reviewing performing quality control of alignment using SAMtools.
Mapping quality scores quantify the probability that a read is incorrectly aligned, given by −10 log10 Pr{mapping position is wrong}, e.g., 1% error rate would be assigned to a MAPQ=20.
SAM/BAM files contains mapping quality scores and alignments can be filtered through SAMtools -q option.
Select reads with MAPQ>=20 through SAMtools and sort .bam files.
mv ../brain/accepted_hits.bam ../brain/brain_chr19.bam
samtools view -b -q20 ../brain/brain_chr19.bam > ../brain/brain_chr19.q20.bam
samtools sort ../brain/brain_chr19.q20.bam ../brain/brain_chr19.q20.sorted
rm -f ../brain/brain_chr19.q20.bam
6-iii. Integrated assignment answers
#Table of Contents
- Module 0 Setting Up for Data Analysis
- Introduction to High Performance Computing Cluster
- Connecting to MGHPCC
- Computing Environment
- Unix Tutorial Part 1: UNIX Bootcamp
- Unix Tutorial Part 2: Shell Scripting
- Unix Tutorial Practice
- Submitting computing jobs to HPC using LSF
- Ignore: Git Tutorial
- Module 1 Introduction/ Overview
- Overview of RNA-seq Experiment
- RNA-Seq Analysis Pipeline
- RNA-Seq Input Data
- RNA-seq File Formats and Software-Specific Files
- Getting Data for Analysis
- Module 2 Quality Control
- Module 3 Tuxedo Pipeline
- The Tuxedo Pipeline
- Read Alignment with TopHat2
- Transcript Assembly with Cufflinks
- Differential Analysis with Cuffdiff
- Visualization with CummeRbund
- Resources and Reference