Skip to content

Post Alignment Quality Control

Jeanie Lim edited this page Jun 22, 2016 · 2 revisions

RNA-seq QC after alignment

After the reads have been aligned to the genome using a splice-aware alignment algorithm, the data are available in BAM (or SAM) format files. We can begin to ask more RNA-seq specific questions about quality at this point in the analysis pipeline. Again, a number of tools are available for RNA-seq quality control (many with very similar names). In this little section of the tutorial, you will be reviewing performing quality control of alignment using SAMtools.

Mapping quality scores quantify the probability that a read is incorrectly aligned, given by −10 log10 Pr{mapping position is wrong}, e.g., 1% error rate would be assigned to a MAPQ=20.

SAM/BAM files contains mapping quality scores and alignments can be filtered through SAMtools -q option.
Select reads with MAPQ>=20 through SAMtools and sort .bam files.

mv ../brain/accepted_hits.bam ../brain/brain_chr19.bam  
samtools view -b -q20 ../brain/brain_chr19.bam > ../brain/brain_chr19.q20.bam  
samtools sort  ../brain/brain_chr19.q20.bam ../brain/brain_chr19.q20.sorted  
rm -f ../brain/brain_chr19.q20.bam  
Clone this wiki locally