Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
armintoepfer committed Dec 23, 2019
0 parents commit 72cb0c1
Show file tree
Hide file tree
Showing 4 changed files with 116 additions and 0 deletions.
34 changes: 34 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
Copyright (c) 2011-2019, Pacific Biosciences of California, Inc.

All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted (subject to the limitations in the
disclaimer below) provided that the following conditions are met:

* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials provided
with the distribution.

* Neither the name of Pacific Biosciences nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.

NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE
GRANTED BY THIS LICENSE. THIS SOFTWARE IS PROVIDED BY PACIFIC
BIOSCIENCES AND ITS CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL PACIFIC BIOSCIENCES OR ITS
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
82 changes: 82 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
<p align="center">
<img src="img/pbmarkdup-logo.png" alt="CCS logo" width="200px"/>
</p>
<h1 align="center">pbmarkdup</h1>
<p align="center">Mark duplicate reads from PacBio sequencing of an amplified library</p>

***

_pbmarkdup_ takes one or multiple sequencing chips of an amplified libray as
HiFi reads and marks or removes duplicates.

## Availability
Latest `pbmarkdup` can be installed via bioconda package `pbmarkdup`.

Please refer to our [official pbbioconda page](https://github.com/PacificBiosciences/pbbioconda)
for information on Installation, Support, License, Copyright, and Disclaimer.

## Latest Version
Version **0.2.0**: [Full changelog here](#changelog)

## Execution
**Input**: HiFi reads from one or multiples movies in PacBio BAM (`.ccs.bam`),
PacBio dataset (`.consensusreadset.bam`), file of file names (`.fofn`),
FASTQ (optionally gzipped), or FASTA (optionally gzipped) format.

**Output**: HiFi reads with duplicates marked in a format inferred from the
file extension: HiFi BAM (`.bam`); FASTQ (`.fastq`); FASTQ (`.fasta`);
bgzipped FASTQ (`.fastq.gz`); bgzipped FASTA (`.fasta.gz`); or SMRT Link XML
(`.consensusreadset.xml`) which also generates a corresponding BAM file.

Run on a full movie:

pbmarkdup movie.ccs.bam output.bam

Run on multiple movies

pbmarkdup movie1.fasta movie2.fasta output.fasta

Run on multiple movies and output duplicates in separate file

pbmarkdup movie1.ccs.bam movie2.fastq uniq.fastq --dup-file dups.fasta

## FAQ

### Why are input files parsed twice?
In order to keep memory footprint to a minimum, we trade reading input files
twice instead of storing everything in memory. The goal was to support
processing multiple movies with a standard server.

### What input / output combinations are allowed

Input as rows, outputs as columns:

| IN/OUT | BAM | DATASET | FASTQ | FASTA |
| ------- | :-: | :-----: | :---: | :---: |
| BAM | x | x | x | x |
| DATASET | x | x | x | x |
| FASTQ | | | x | x |
| FASTA | | | | x |

Allowed combination example:

pbmarkdup movie1.ccs.bam movie2.fastq movie3.fasta out.fasta

Forbidden combination example:

pbmarkdup movie2.fastq movie3.fasta out.fastq

### Is there a progress report?
Yes. With `--log-level INFO`, _pbmarkdup_ provides status to `stderr`.

## Licenses
PacBio® tool _pbmarkdup_, distributed via Bioconda, is licensed under
[BSD-3-Clause-Clear](https://spdx.org/licenses/BSD-3-Clause-Clear.html).

## Changelog

* **0.2.0**:
* Initial release

## DISCLAIMER
THIS WEBSITE AND CONTENT AND ALL SITE-RELATED SERVICES, INCLUDING ANY DATA, ARE PROVIDED "AS IS," WITH ALL FAULTS, WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. YOU ASSUME TOTAL RESPONSIBILITY AND RISK FOR YOUR USE OF THIS SITE, ALL SITE-RELATED SERVICES, AND ANY THIRD PARTY WEBSITES OR APPLICATIONS. NO ORAL OR WRITTEN INFORMATION OR ADVICE SHALL CREATE A WARRANTY OF ANY KIND. ANY REFERENCES TO SPECIFIC PRODUCTS OR SERVICES ON THE WEBSITES DO NOT CONSTITUTE OR IMPLY A RECOMMENDATION OR ENDORSEMENT BY PACIFIC BIOSCIENCES.
Binary file added img/pbmarkdup-logo.pdf
Binary file not shown.
Binary file added img/pbmarkdup-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 72cb0c1

Please sign in to comment.