diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..379c2e2 --- /dev/null +++ b/LICENSE @@ -0,0 +1,34 @@ +Copyright (c) 2011-2019, Pacific Biosciences of California, Inc. + +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted (subject to the limitations in the +disclaimer below) provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above + copyright notice, this list of conditions and the following + disclaimer in the documentation and/or other materials provided + with the distribution. + + * Neither the name of Pacific Biosciences nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + +NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE +GRANTED BY THIS LICENSE. THIS SOFTWARE IS PROVIDED BY PACIFIC +BIOSCIENCES AND ITS CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED +WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES +OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +DISCLAIMED. IN NO EVENT SHALL PACIFIC BIOSCIENCES OR ITS +CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +SUCH DAMAGE. diff --git a/README.md b/README.md new file mode 100644 index 0000000..51e8585 --- /dev/null +++ b/README.md @@ -0,0 +1,82 @@ +

+ CCS logo +

+

pbmarkdup

+

Mark duplicate reads from PacBio sequencing of an amplified library

+ +*** + +_pbmarkdup_ takes one or multiple sequencing chips of an amplified libray as +HiFi reads and marks or removes duplicates. + +## Availability +Latest `pbmarkdup` can be installed via bioconda package `pbmarkdup`. + +Please refer to our [official pbbioconda page](https://github.com/PacificBiosciences/pbbioconda) +for information on Installation, Support, License, Copyright, and Disclaimer. + +## Latest Version +Version **0.2.0**: [Full changelog here](#changelog) + +## Execution +**Input**: HiFi reads from one or multiples movies in PacBio BAM (`.ccs.bam`), +PacBio dataset (`.consensusreadset.bam`), file of file names (`.fofn`), +FASTQ (optionally gzipped), or FASTA (optionally gzipped) format. + +**Output**: HiFi reads with duplicates marked in a format inferred from the +file extension: HiFi BAM (`.bam`); FASTQ (`.fastq`); FASTQ (`.fasta`); +bgzipped FASTQ (`.fastq.gz`); bgzipped FASTA (`.fasta.gz`); or SMRT Link XML +(`.consensusreadset.xml`) which also generates a corresponding BAM file. + +Run on a full movie: + + pbmarkdup movie.ccs.bam output.bam + +Run on multiple movies + + pbmarkdup movie1.fasta movie2.fasta output.fasta + +Run on multiple movies and output duplicates in separate file + + pbmarkdup movie1.ccs.bam movie2.fastq uniq.fastq --dup-file dups.fasta + +## FAQ + +### Why are input files parsed twice? +In order to keep memory footprint to a minimum, we trade reading input files +twice instead of storing everything in memory. The goal was to support +processing multiple movies with a standard server. + +### What input / output combinations are allowed + +Input as rows, outputs as columns: + +| IN/OUT | BAM | DATASET | FASTQ | FASTA | +| ------- | :-: | :-----: | :---: | :---: | +| BAM | x | x | x | x | +| DATASET | x | x | x | x | +| FASTQ | | | x | x | +| FASTA | | | | x | + +Allowed combination example: + + pbmarkdup movie1.ccs.bam movie2.fastq movie3.fasta out.fasta + +Forbidden combination example: + + pbmarkdup movie2.fastq movie3.fasta out.fastq + +### Is there a progress report? +Yes. With `--log-level INFO`, _pbmarkdup_ provides status to `stderr`. + +## Licenses +PacBio® tool _pbmarkdup_, distributed via Bioconda, is licensed under +[BSD-3-Clause-Clear](https://spdx.org/licenses/BSD-3-Clause-Clear.html). + +## Changelog + + * **0.2.0**: + * Initial release + +## DISCLAIMER +THIS WEBSITE AND CONTENT AND ALL SITE-RELATED SERVICES, INCLUDING ANY DATA, ARE PROVIDED "AS IS," WITH ALL FAULTS, WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. YOU ASSUME TOTAL RESPONSIBILITY AND RISK FOR YOUR USE OF THIS SITE, ALL SITE-RELATED SERVICES, AND ANY THIRD PARTY WEBSITES OR APPLICATIONS. NO ORAL OR WRITTEN INFORMATION OR ADVICE SHALL CREATE A WARRANTY OF ANY KIND. ANY REFERENCES TO SPECIFIC PRODUCTS OR SERVICES ON THE WEBSITES DO NOT CONSTITUTE OR IMPLY A RECOMMENDATION OR ENDORSEMENT BY PACIFIC BIOSCIENCES. diff --git a/img/pbmarkdup-logo.pdf b/img/pbmarkdup-logo.pdf new file mode 100644 index 0000000..b890e4e Binary files /dev/null and b/img/pbmarkdup-logo.pdf differ diff --git a/img/pbmarkdup-logo.png b/img/pbmarkdup-logo.png new file mode 100644 index 0000000..eeabeac Binary files /dev/null and b/img/pbmarkdup-logo.png differ