Skip to content

Latest commit

 

History

History

pangenome

1.VCF操作

Liao W W, Asri M, Ebler J, et al. A draft human pangenome reference[J]. Nature, 2023, 617(7960): 312-324.

a.multi-sample VCF files were then converted to per-sample VCF files using bcftools view -a -I -s

bcftools view -a -I -s <sample name>

b.Left-align and normalize indels, check if REF alleles match the reference, split multiallelic sites into multiple rows; Left-alignment and normalization will only be applied if the --fasta-ref option is supplied.

bcftools norm -m -any -o out.vcf in.vcf

c.The multi-nucleotide polymorphisms and complex indels were further decomposed into SNPs and simple indels,https://github.com/RealTimeGenomics/rtg-tools

vcfdecompose --break-mnps --no-header --no-gzip --break-indels -i out.vcf -o out1.vcf 

d.select snp(-v, --types snps|indels|mnps|other)

bcftools view -v snps out1.format.vcf >out2.snp.vcf

e.converts from [start, end] to [start-1, start),BEDOPS:https://bedops.readthedocs.io/en/latest/index.html

# three custom options for filtering input for each of the three types of variants listed: --snvs, --insertions and --deletions. 
vcf2bed < input.vcf > output.bed

2.Haplotype Comparison Tools

llumina/hap.py https://github.com/Illumina/hap.py

yum install -y devtoolset-8-gcc devtoolset-3-binutils devtoolset-8-gcc-c++ devtoolset-8-gcc-gfortran centos-release-scl boost169-devel autoconf automake glibc-static libstdc++-static python-devel ant cmake bzip2 bzip2-devel ncurses-devel zlib-devel
pip install -y cython numpy scipy biopython matplotlib pandas pysam bx-python pyvcf cyvcf2 nose
wget https://github.com/Illumina/hap.py/archive/refs/tags/v0.3.15.tar.gz
tar xzvf v0.3.15.tar.gz
cd hap.py-0.3.15/
python install.py ~/hap.py-v0.3.15/ --with-rtgtools --no-tests

Platinum Genomes https://github.com/Illumina/PlatinumGenomes/blob/master/files/2017-1.0.files

hap.py NA12878.vcf.gz test.vcf.gz -o test -r hg19.fa --threads 40 -f ConfidentRegions.bed.gz

3.参考链接

GA4GH benchmarking-tools:Germline Small Variant Benchmarking Tools and Standards

The International Sample Genome Resource (IGSR)

UCSC hg19:giab+platinumGenomes

4.BAM 文件数据抽取

sambamba-1.0.1-linux-amd64-static view -t 36 -f bam -s 0.1313 -o out.bam in.bam
-s, --subsample=FRACTION
                    subsample reads (read pairs)

5. SAM文件

SAM format specification:https://www.samformat.info/sam-format-flag

SAM file

6.参考文献

Deng L, Xie B, Wang Y, et al. A protocol for applying a population-specific reference genome assembly to population genetics and medical studies[J]. STAR protocols, 2022, 3(2): 101440.

Gao Y, Yang X, Chen H, et al. A pangenome reference of 36 Chinese populations[J]. Nature, 2023: 1-10.

Liao W W, Asri M, Ebler J, et al. A draft human pangenome reference[J]. Nature, 2023, 617(7960): 312-324.