All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- include GIT commit in logs
- script to generate report on closed and unclosed gaps
- script
mask2bed
that converts Dazzler masks to BED files - JSON schema for config file
- debugging output to track down issue #31
- preserve original scaffold headers in output FASTA
- ensure unique scaffold IDs in
output
- output FASTA names/coords in AGP
- parallelized alignment filters
- fail early measure against bug in
libmaus2
- added missing Python installation to Singularity image
- treat long FASTA lines graciously
- fixed rule
validate_dentist_config
- fixed install instructions Snakemake profile
- fixed bug with newer versions of Snakemake
- include Python files in GIT repo
- close open LAS file asap
- fixed JSON conversion of
AlignmentChain
- workaround for Phobos v2.099.0 bug
- fixed compiler error
- treat compiler warning
- Conda packages
dentist
anddentist-core
- DENTIST's configuration may be in YAML format
- print summary of all commands with
dentist --commands
- user may select the maximum alignment error rate
- note on a known bug that prohibits using
::
in FASTA headers - online API documentation
- included the demo example into the main repo
- included JQ in the container for easy inspections
- minimal integration tests that cover the whole pipeline
- substantially extended code documentation
- improved documentation of
read-coverage
and friends - improved error message if no pile ups have been found
- using a fixed version for Containers to avoid caching issues
- renamed workflow parameter
max_threads
→threads_per_process
- keep assertions in production code
- allow empty LAS files for masking
- improved pre-push hook to reduce accidental errors
- Docker container; now building directly Singularity image
- outdated integration tests
- deprecated and unused code
- obsolete testing command
translocate-gaps
- improved compatibility of pre-compiled binaries by using Conda package
- make alignments with more than 2^^32 local alignments work
- minor compatibility fixes in the container
- broken links in README
- replaced defintion list by simple list in README
- list of all commandline options
- example for a greedy DENTIST configuration
- guide on how to release a new version of DENTIST (work in progress)
- release v1.0.1 contained breaking changes so this release updates to v2.0.0: the changes to the workflow make it incompatible with old configuration files
- moved Docker image to Ubuntu and reduced size
- improved compatibility of pre-compiled binaries by compiling on Ubuntu 16.04
- sort read IDs in
insertions.db
to make AGP and BED files comparable - allow
--min-*-coverage
indentist mask-repetitive-regions
to be zero - avoid confusing message about pre-fetching the Singularity image if possible
- updated README
- unused argument for
process-pile-ups
- replaced
Dockerfile.build-release
by regularDockerfile
- fixed
ProtectedOutputException
bug that was listed in the Troubleshooting section of the README - sort LAS files for daccord without chaining
- buffer overflow in
propagate-mask
- adjust
read-coverage
in example configuration to actual coverage in the example dataset
- provide pre-built binaries of DENTIST and all dependencies in release tarball
- included unit tests in Docker build
- github.io page
- Improved README a lot
- Updated dependencies
- Removed
LAcheck
from the workflow because it is useless (see issue 14)
- Compiler error and deprecation warnings
- A wonderful logo :-)
- Updated README and other docs
- Some jobs in the workflow are grouped to reduce the number of cluster jobs
- Workflow requires a minimum Snakemake version
- Ignoring unused parameter in
process-pile-ups
; will be removed in next major release - Disentangled workflow configuration for better usability and less build time for Sakemake's DAG
- Old documentation parts/details
- Sporadically lost masked regions in mask homogenization
- Handling of cyclic scaffolds
- Overly strict handling of types in DENTIST's config file
- Several minor bugs
- A Docker container! This means you can just
--use-singularity
with Snakemake. - Workflow rule to just produce all the repeat masks (this is used in the paper to calculate the repeat content of the assemblies)
- Automatic validation of the closed gaps with an alignment of the reads
against a preliminary gap-closed assembly:
- Added command
bed2mask
- Optionally write a BED file of closed gaps
- Added command
validate-regions
- Added interface for reading/writing Dazzler track extras which is
utilized to communicate the contig and read IDs between
output
andvalidate-regions
- Added command
- Extensively documented the example workflow config
./snakemake/snakemake.yml
- Local alignment chaining via command
chain-local-alignments
and internally - Using chaining to filter/improve pile up alignments
- Added possibility to revert CLI options via
--revert
- All multi-valued CLI options take their value from a comma-separated list and/or by giving the same option multiple times
- Added
full_validation
flag to workflow to keep the preliminary assembly and validation results - Added
no_purge_output
flag to workflow to prevent the automatic skipping of invalid gaps; this also will not trigger the validation if not requested explicitly - Possibility to lazily read local alignments from
.las
file - Greatly improved performance of reading
.las
files by switching to binary interface - Possibility to manually skip filling of gaps
DBdust
for improved sensitivity in alignments- Homogenized masks implemented via new command
propagate-mask
which translates a given mask via an alignment from one DB/DAM to another. The masks are propagated from the assembly to the reads and back to gain sensitivity.
dentist --dependencies
now reports the availability of the listed tools and exits non-zero if some dependency is missing- Avoid loading the full alignment into memory when masking
- Refactored
getAlignments
andgetFlatLocalAlignments
such that the caller has full control over the buffering strategy - Streamlined option passing for
daligner
,damapper
,datander
,DBdust
anddaccord
. Also, removed--reference-error
and--reads-error
in favor of default value-e.7
in all cases. Basically, one must not modify-e
without adjusting the rest of the options accordingly. Setting it to the minimum value just ensures no alignments are discarded for no good reason. - Masks can be created on a "block-level" and later merged with
merged-masks
(this is incompatible with Dazzler'sCatrack
) - Avoid cryptic error message if alignment is not a valid
.las
file - Distributed tandem repeat masking
- Renamed
workdir
→tmpdir
to avoid confusion - Raised default value of
--max-insertion-error
(experiments show a small drop in correctness but large gain in contiguity) - Replaced obscure
damapper
argument fromblock_alignments
byblock_a=FULL_DB
orblock_b=FULL_DB
- Behavior of environment substitution in workflow config files:
- The config file may contain
default_env
and/oroverride_env
. This allows to create "template" config files and "instantiation" config files because snakemake allows the user to specify more than one config file. - If just the placeholder string is given (e.g.
$FOO
) then substitute the exact value and type in the currentenv
dictionary. This means, the type of a value given viadefault_env
oroverride_env
is copied which in turn prevents type error in DENTIST's config file.
- The config file may contain
- Included
filter-mask
into the standard commands because it can be useful for adjusting repeat masks - Removed some dead code
- Dependency on
LAdump
anddumpLA
- Added
properAlignmentAllowance
tocompletelyCovers
- Fixed workflow for non-PacBio reads
- Snakemake can now run in single pass; before a separate call was required to create DENTIST's config file.
- Deactivated
daligner
's bridging (-B
) when self-alignments are requested (-I
) to avoid a bug. - Many more fixes to workflow and related files under
./snakemake/
- Details in commands in README
- Remedied syntax highlighting errors
- Many technical bugs/errors
- Compilation with
ldc2
- Always skip file locking with environment variable
SKIP_FILE_LOCKING=1
- Allow use of environment variables in Snakemake workflow config
- Avoid appending to DBs by design
- Improved README:
- Advice on how to choose parameters
- Advice on how to run DENTIST with different read types
- Version information to dependencies
- Log level information to log messages
- More logging on failed gap closing
- Simplified usage of
--workdir
: no need to manually create the designated directory - Improvements to close more gaps:
- Custom pre-consensus alignment filtering
- Add support sequence to cropped reads to ensure daligner finds alignments
- Allow cropping in masked region if necessary
- Selectively ignore repeat mask to allow post consensus alignments
- Increased sensitivity in pileup alignments by adding the bridging option
of
daligner
- Select reference read for consensus by intrinsic QVs → better consensus quality
- Moved flag
--max-insertion-error
fromprocess
tooutput
stage so trying different values becomes much faster - Automatically deduce trace point spacing in all places
- Faster check if
.las
files are empty → faster CLI options checking - Naming of temporary files for easier inspection
- Use
DBdust
for post consensus alignment - Produce
.db
for cropped pileups (temporary files) to makeDAScover
andDASqv
work - Removed
-I
option fromdaligner
calls (avoid useless alignment)
- Several bugs in Snakemake workflow
- Significantly improved number of closed gaps
- Coordinates in AGP output
- Bug in procedure that identifies a good cropping position
- Error that caused
--proper-alignment-allowance
to have no effect by default
- post-consensus alignment and validation with new parameter
--max-insertion-error
- inserted sequences are highlighted by upper-case letters which can be
turned off with
--no-highlight-insertions
- batch ranges may end with a
$
indicating the end of the pileup DB - some mechanisms for early error detection
- write duplicate contig IDs to contig alignment cache for easier debugging
- added support for complementary contig alignments in
check-results
- allow
.db
databases as reference - improved version reporting
- updated README with additional instructions
- integrated Snakemake workflow into a single file and removed "testing" workflow
- cropping and splicing of insertions:
- existing sequence is completely retained
- moved from
process-pile-ups
tooutput
- binary format of insertions DBs (breaking change) to gain more freedom in later steps
- splice sites are chosen based on the post-consensus alignments
- ambiguities in the alignment of reads are now detected globally
- weakly anchored alignments are discarded early in the filtering pipeline
- the self- and read-alignment-based masks are now computed separately
- coverage values may now be fractional
- improved README by adhering to Standard Readme
- better (error) reporting
- temporary files have more informative names
- many minor refactorings and extensions
- combined self- and read-alignment-based masking: old behvaiour can be copied
by using the
--mask
parameter and supplying both masks to all commands
- trying all possible reference reads for consensus in order to find a non-failing reference
- corrected insertion splicing in case of reverse-complement alignment of the consensus
- bug that caused
check-results
to discard all alignments in certain loci - added missing logic for cropped contigs in
getGapState
incheck-results
- work-around for
damapper
bug - histograms generated by
check-results
include a column for.999
sequence identity check-results
optionally writes a detailed gap report
- simplified the coverage bounds interface of
mask-repetitive-regions
: only max-values and/or the read coverage are required - improve consensus quality by using
lasfilteralignments
to remove deteriorating local alignments - reduced value of
--min-reads-per-pile-up
to--min-spanning-reads
(default: 3) to better work with extremely low coverage - reduced the default minimum anchor length to 500
- fix & simplify score function for read alignments
- suppress generation of two las files for reads alignment in
generate-dazzler-options
- insertion DBs do not include information about existing contigs anymore which makes validation of results easier
- renamed
debug-graph
→debug-scaffold
for clarity - added logging for discarded pile ups
check-results
counts complementary alignments (inversions) as errors- remove leading/trailing gaps from all checks by
check-results
- improved quality of documentation
- implemented routine to generate exact alignments from
.las
alignment chains - better conflict handling for scaffold graph (see
--min-spanning-reads
and--best-pile-up-margin
) - resolve conflicting cropping information attached to contigs
- accept both
.db
and.dam
files - new pipeline command
process-pile-ups
to compute consensus in parallel - introduced build config
testing
which produces additional commands for evaluation of results:translocate-gaps
find-closable-gaps
translate-coords
check-results
- improved logging in general
- output of assembly graph using
--debug-graph
- sorted CLI options alphabetically
- inspect Dazzler masks using
show-mask
command - separate command
mask-repetitive-regions
for masking show-*
commands have--json
switch- CLI switch
--usage
- runtime improvements
- improved type definitions for better debugging experience
- many small bug fixes