The definition of macaque full-length gene models

This repository contains pipeline and scripts that were used to generate resutls in paper:

Yumei Li, Qing Sunny Shen, Qi Peng, Wanqiu Ding, Jie Zhang, Xiaoming Zhong, Ni A An, Mingjun Ji, Wei-Zhen Zhou, Chuan-Yun Li, Polyadenylation-related isoform switching in human evolution revealed by full-length transcript structure, Briefings in Bioinformatics, 2021;, bbab157, https://doi.org/10.1093/bib/bbab157

transcriptomeProject.sh: This file contains all the steps to analyze all the data.

/scripts: this directory contains some useful scripts for both NGS and Iso-seq data analysis.

Reads processing:

PAratio.OnGenes.sh: find PA sites based on Iso-seq reads, assign them to genes and calculate the PA usage/ratio (PA supported reads/all reads assigned to the gene) for each gene.
dnaOrInternalPrimingContFilter.pl: remove Iso-seq reads with possible DNA or internal primer comtaminations.
separatePaReadsFromNotFL.pl: separate Iso-seq reads with poly(A) tails from those non-full-length ones.
fqTrimPA.pl: trim the poly(A) tail from Iso-seq reads.
paCluster_revision.pl: output PA clusters based on Iso-seq reads in bed12 format.
pacBioReadAssign.sh: output the pacbio reads number for each input reference transcript.
readGroup_YM.pl: classify Iso-seq reads into unique, ambiguous or novel based on assigned gene number.
readsAssigner.pl: assign Iso-seq reads to genes.

File convert or processing:

geneCollaspeForRSeQC.pl: collapse all transcripts to a single transcript model for each gene based on procedures using by GTEx (https://gtexportal.org/home/documentationPage#staticTextAnalysisMethods)
bed2gpe.pl: convert bed12 format file to genePred format file.
sam2bed.pl: convert sam format file to bed12 format.
filter.pl: filter a file based the input target file.
select.pl: pick out the reference which contain the query.
unique.pl: output unique result based on the values of specific column.
join.pl: join two files based on sepcific columns.
gpeFeature.pl: extract features from genePred format file.
grep_exonfasta_from_gpe.pl: get fa format sequences for exons from genePred file.
Base.faConvertProtein.fa.pl: convert nucleotides sequence fa format file to amino acid sequence fa format file.

Statistics for files:

nucleotide_count_AcrossPoint.pl: count each type of nucleotides flanking the input points. 
rowCor.R: calculate correlation value for the two input files by row.
PAmotifFinding.pl: output 6-bp PA signal motif and its coordinate.
PAmotifFreq.agg.pl: summarize the frequency of the input PA signals in region upstream of PA sites.
findHLongFromSpecific.pl: output species-specific PA events.

Data visualization:

barplot.R: generate barplots.
barplot.PAmotif.R: generate barplot for different PA signal motif.
hist.R: generete histograms.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
scripts		scripts
README.md		README.md
transcriptomeProject.sh		transcriptomeProject.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The definition of macaque full-length gene models

About

Uh oh!

Releases

Packages

Uh oh!

Languages

xihuimeijing/Macaque-transcriptome

Folders and files

Latest commit

History

Repository files navigation

The definition of macaque full-length gene models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages