Skip to content

Commit

Permalink
Release 1.13
Browse files Browse the repository at this point in the history
  • Loading branch information
valeriuo committed Jul 7, 2021
2 parents bd133ac + 09255e6 commit 911cb8e
Show file tree
Hide file tree
Showing 84 changed files with 2,155 additions and 428 deletions.
17 changes: 13 additions & 4 deletions INSTALL
Original file line number Diff line number Diff line change
Expand Up @@ -49,15 +49,15 @@ Storage is enabled.

Amazon S3 support requires an HMAC function to calculate a message
authentication code. On MacOS, the CCHmac function from the standard
library is used. Systems that do not have CChmac will get this from
library is used. Systems that do not have CCHmac will get this from
libcrypto. libcrypto is part of OpenSSL or one of its derivatives (LibreSSL
or BoringSSL).

On Microsoft Windows we recommend use of Mingw64/Msys2. Note that
currently for the test harness to work you will need to override the
test temporary directory with e.g.: make check TEST_OPTS="-t C:/msys64/tmp/_"
Whilst the code may work on Windows with other environments, these have
not be verified.
not been verified.

Update htscodecs submodule
==========================
Expand Down Expand Up @@ -103,7 +103,7 @@ configure and just type 'make; make install' as for previous versions
of HTSlib. However if the build fails you should run './configure' as
it can diagnose the common reasons for build failures.

The 'make' command builds the HTSlib library and and various useful
The 'make' command builds the HTSlib library and various useful
utilities: bgzip, htsfile, and tabix. If compilation fails you should
run './configure' as it can diagnose problems with your build environment
that cause build failures.
Expand Down Expand Up @@ -150,7 +150,10 @@ various features and specify further optional external requirements:

--enable-libcurl
Use libcurl (<http://curl.se/>) to implement network access to
remote files via FTP, HTTP, HTTPS, etc.
remote files via FTP, HTTP, HTTPS, etc. By default or with
--enable-libcurl=check, configure will probe for libcurl and include
this functionality if libcurl is available. Use --disable-libcurl
to prevent this.

--enable-gcs
Implement network access to Google Cloud Storage. By default or with
Expand All @@ -176,6 +179,12 @@ various features and specify further optional external requirements:
By default, ./configure will probe for libdeflate and use it if
available. To prevent this, use --without-libdeflate.

Each --enable-FEATURE/--disable-FEATURE/--with-PACKAGE/--without-PACKAGE
option listed also has an opposite, e.g., --without-external-htscodecs
or --disable-plugins. However, apart from those options for which the
default is to probe for related facilities, using these opposite options
is mostly unnecessary as they just select the default configure behaviour.

The configure script also accepts the usual options and environment variables
for tuning installation locations and compilers: type './configure --help'
for details. For example,
Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ according to the terms of the following MIT/Expat license.]

The MIT/Expat License

Copyright (C) 2012-2020 Genome Research Ltd.
Copyright (C) 2012-2021 Genome Research Ltd.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
37 changes: 22 additions & 15 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -113,26 +113,28 @@ htscodecs.mk:
echo '# Default htscodecs.mk generated by Makefile' > $@
echo 'include $$(HTSPREFIX)htscodecs_bundled.mk' >> $@

srcdir = .
srcprefix =
HTSPREFIX =
include htslib_vars.mk
include htscodecs.mk

# If not using GNU make, you need to copy the version number from version.sh
# into here.
PACKAGE_VERSION := $(shell ./version.sh)
PACKAGE_VERSION := $(shell $(srcdir)/version.sh)

LIBHTS_SOVERSION = 3

# Version numbers for the Mac dynamic library. Note that the leading 3
# is not strictly necessary and should be removed the next time
# LIBHTS_SOVERSION is bumped (see #1144 and
# https://developer.apple.com/library/archive/documentation/DeveloperTools/Conceptual/DynamicLibraries/100-Articles/DynamicLibraryDesignGuidelines.html#//apple_ref/doc/uid/TP40002013-SW23)
MACH_O_COMPATIBILITY_VERSION = 3.1.12
MACH_O_CURRENT_VERSION = 3.1.12
MACH_O_COMPATIBILITY_VERSION = 3.1.13
MACH_O_CURRENT_VERSION = 3.1.13

# $(NUMERIC_VERSION) is for items that must have a numeric X.Y.Z string
# even if this is a dirty or untagged Git working tree.
NUMERIC_VERSION := $(shell ./version.sh numeric)
NUMERIC_VERSION := $(shell $(srcdir)/version.sh numeric)

# Force version.h to be remade if $(PACKAGE_VERSION) has changed.
version.h: $(if $(wildcard version.h),$(if $(findstring "$(PACKAGE_VERSION)",$(shell cat version.h)),,force))
Expand Down Expand Up @@ -254,7 +256,7 @@ config.h:
# on htslib.pc.in listed, as if that file is newer the usual way to regenerate
# this target is via configure or config.status rather than this rule.
htslib.pc.tmp:
sed -e '/^static_libs=/s/@static_LIBS@/$(htslib_default_libs)/;s#@[^-][^@]*@##g' htslib.pc.in > $@
sed -e '/^static_libs=/s/@static_LIBS@/$(htslib_default_libs)/;s#@[^-][^@]*@##g' $(srcprefix)htslib.pc.in > $@

# Create a makefile fragment listing the libraries and LDFLAGS needed for
# static linking. This can be included by projects that want to build
Expand Down Expand Up @@ -449,16 +451,15 @@ htscodecs/htscodecs:

# Build the htscodecs/htscodecs/version.h file if necessary
htscodecs/htscodecs/version.h: force
@if test -e htscodecs/.git && test -e htscodecs/configure.ac ; then \
cd htscodecs && \
vers=`git describe --always --dirty --match 'v[0-9]\.[0-9]*'` && \
@if test -e $(srcdir)/htscodecs/.git && test -e $(srcdir)/htscodecs/configure.ac ; then \
vers=`cd $(srcdir)/htscodecs && git describe --always --dirty --match 'v[0-9]\.[0-9]*'` && \
case "$$vers" in \
v*) vers=$${vers#v} ;; \
*) iv=`awk '/^AC_INIT/ { match($$0, /^AC_INIT\(htscodecs, *([0-9](\.[0-9])*)\)/, m); print substr($$0, m[1, "start"], m[1, "length"]) }' configure.ac` ; vers="$$iv$${vers:+-g$$vers}" ;; \
*) iv=`awk '/^AC_INIT/ { match($$0, /^AC_INIT\(htscodecs, *([0-9](\.[0-9])*)\)/, m); print substr($$0, m[1, "start"], m[1, "length"]) }' $(srcdir)/htscodecs/configure.ac` ; vers="$$iv$${vers:+-g$$vers}" ;; \
esac ; \
if ! grep -s -q '"'"$$vers"'"' htscodecs/version.h ; then \
if ! grep -s -q '"'"$$vers"'"' $@ ; then \
echo 'Updating $@ : #define HTSCODECS_VERSION_TEXT "'"$$vers"'"' ; \
echo '#define HTSCODECS_VERSION_TEXT "'"$$vers"'"' > htscodecs/version.h ; \
echo '#define HTSCODECS_VERSION_TEXT "'"$$vers"'"' > $@ ; \
fi ; \
fi
endif
Expand All @@ -470,6 +471,11 @@ maintainer-check:
test/maintainer/check_copyright.pl .
test/maintainer/check_spaces.pl .

# Create a shorthand. We use $(SRC) or $(srcprefix) rather than $(srcdir)/
# for brevity in test and install rules, and so that build logs do not have
# ./ sprinkled throughout.
SRC = $(srcprefix)

# For tests that might use it, set $REF_PATH explicitly to use only reference
# areas within the test suite (or set it to ':' to use no reference areas).
#
Expand All @@ -490,6 +496,7 @@ check test: $(BUILT_PROGRAMS) $(BUILT_TEST_PROGRAMS) $(BUILT_PLUGINS) $(HTSCODEC
cd test/sam_filter && ./filter.sh filter.tst
cd test/tabix && ./test-tabix.sh tabix.tst
cd test/mpileup && ./test-pileup.sh mpileup.tst
cd test/fastq && ./test-fastq.sh
REF_PATH=: test/sam test/ce.fa test/faidx.fa test/fastqs.fq
test/test-regidx
cd test && REF_PATH=: ./test.pl $${TEST_OPTS:-}
Expand Down Expand Up @@ -686,11 +693,11 @@ shlib-exports-dll.txt: hts.dll.a
install: libhts.a $(BUILT_PROGRAMS) $(BUILT_PLUGINS) installdirs install-$(SHLIB_FLAVOUR) install-pkgconfig
$(INSTALL_PROGRAM) $(BUILT_PROGRAMS) $(DESTDIR)$(bindir)
if test -n "$(BUILT_PLUGINS)"; then $(INSTALL_PROGRAM) $(BUILT_PLUGINS) $(DESTDIR)$(plugindir); fi
$(INSTALL_DATA) htslib/*.h $(DESTDIR)$(includedir)/htslib
$(INSTALL_DATA) $(SRC)htslib/*.h $(DESTDIR)$(includedir)/htslib
$(INSTALL_DATA) libhts.a $(DESTDIR)$(libdir)/libhts.a
$(INSTALL_MAN) bgzip.1 htsfile.1 tabix.1 $(DESTDIR)$(man1dir)
$(INSTALL_MAN) faidx.5 sam.5 vcf.5 $(DESTDIR)$(man5dir)
$(INSTALL_MAN) htslib-s3-plugin.7 $(DESTDIR)$(man7dir)
$(INSTALL_MAN) $(SRC)bgzip.1 $(SRC)htsfile.1 $(SRC)tabix.1 $(DESTDIR)$(man1dir)
$(INSTALL_MAN) $(SRC)faidx.5 $(SRC)sam.5 $(SRC)vcf.5 $(DESTDIR)$(man5dir)
$(INSTALL_MAN) $(SRC)htslib-s3-plugin.7 $(DESTDIR)$(man7dir)

installdirs:
$(INSTALL_DIR) $(DESTDIR)$(bindir) $(DESTDIR)$(includedir) $(DESTDIR)$(includedir)/htslib $(DESTDIR)$(libdir) $(DESTDIR)$(man1dir) $(DESTDIR)$(man5dir) $(DESTDIR)$(man7dir) $(DESTDIR)$(pkgconfigdir)
Expand Down
126 changes: 126 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
@@ -1,3 +1,129 @@
Noteworthy changes in release 1.13 (7th July 2021)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Features and Updates
--------------------

* In case a PG header line has multiple ID tags supplied by other applications,
the header API now selects the first one encountered as the identifying tag
and issues a warning when detecting subsequent ID tags.
(#1256; fixed samtools/samtools#1393)

* VCF header reading function (vcf_hdr_read) no longer tries to download a
remote index file by default.
(#1266; fixes #380)

* Support reading and writing FASTQ format in the same way as SAM, BAM or CRAM.
Records read from a FASTQ file will be treated as unmapped data.
(#1156)

* Added GCP requester pays bucket access. Thanks to @indraniel.
(#1255)

* Made mpileup's overlap removal choose which copy to remove at random instead
of always removing the second one. This avoids strand bias in experiments
where the +ve and -ve strand reads always appear in the same order.
(#1273; fixes samtools/bcftools#1459)

* It is now possible to use platform specific BAQ parameters. This also
selects long-read parameters for read lengths bigger than 1kb, which helps
bcftools mpileup call SNPs on PacBio CCS reads.
(#1275)

* Improved bcf_remove_allele_set. This fixes a bug that stopped iteration over
alleles prematurely, marks removed alleles as 'missing' and does automatic
lazy unpacking.
(#1288; fixes #1259)

* Improved compression metrics for unsorted CRAM files. This improves the
choice of codecs when handling unsorted data.
(#1291)

* Linear index entries for empty intervals are now initialised with the file
offset in the next non-empty interval instead of the previous one. This
may reduce the amount of data iterators have to discard before reaching
the desired region, when the starting location is in a sequence gap.
Thanks to @carsonh for reporting the issue.
(#1286; fixes #486)

* A new hts_bin_level API function has been added, to compute the level of a
given bin in the binning index.
(#1286)

* Related to the above, a new API method, hts_idx_nseq, now returns the total
number of contigs from an index.
(#1295 and #1299)

* Added bracket handling to bcf_hdr_parse_line, for use with ##META lines.
Thanks to Alberto Casas Ortiz.
(#1240)

Build changes
-------------

These are compiler, configuration and makefile based changes.

* HTSlib now uses libhtscodecs release 1.1.1.

* Added a curl/curl.h check to configure and improved INSTALL documentation on
build options. Thanks to Melanie Kirsche and John Marshall.
(#1265; fixes #1261)

* Some fixes to address GCC 11.1 warnings.
(#1280, #1284, #1285; fixes #1283)

* Supports building HTSlib in a separate directory. Thanks to John Marshall.
(#1277; fixes #231)

* Supports building HTSlib on MinGW 32-bit environments. Thanks to
John Marshall.
(#1301)

Bug fixes
---------

* Fixed hts_itr_query() et al region queries: fixed bug introduced in
HTSlib 1.12, which led to iterators producing very few reads for some
queries (especially for larger target regions) when unmapped reads were
present. HTSlib 1.11 had a related problem in which iterators would omit
a few unmapped reads that should have been produced; cf #1142.
Thanks to Daniel Cooke for reporting the issue.
(#1281; fixes #1279)

* Removed compressBound assertions on opening bgzf files. Thanks to
Gurt Hulselmans for reporting the issue.
(#1258; fixed #1257)

* Duplicate sample name error message for a VCF file now only displays the
duplicated name rather the entire same name list.
(#1262; fixes samtools/bcftools#1451)

* Fix to make samtools cat work on CRAMs again.
(#1276; fixes samtools/samtools#1420)

* Fix for a double memory free in SAM header creation. Thanks to @ihsineme.
(#1274)

* Prevent assert in bcf_sr_set_regions. Thanks to Dr K D Murray.
(#1270)

* Fixed crash in knet_open() etc stubs. Thanks to John Marshall.
(#1289)

* Fixed filter expression "cigar" on unmapped reads. Stop treating an empty
CIGAR string as an error. Thanks to Chang Y for reporting the issue.
(#1298, fixes samtools/samtools#1445)

* Bug fixes in the bundled copy of htscodecs:

- Fixed an uninitialized access in the name tokeniser decoder.
(samtools/htscodecs#23)

- Fixed a bug with name tokeniser and variable number of names per slice,
causing it to incorrectly report an error on certain valid inputs.
(samtools/htscodecs#24)


Noteworthy changes in release 1.12 (17th March 2021)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
22 changes: 22 additions & 0 deletions README
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,25 @@ formats, such as SAM, CRAM, VCF, and BCF, used for high-throughput sequencing
data. It is the core library used by samtools and bcftools.

See INSTALL for building and installation instructions.

Please cite this paper when using HTSlib for your publications:

HTSlib: C library for reading/writing high-throughput sequencing data
James K Bonfield, John Marshall, Petr Danecek, Heng Li, Valeriu Ohan, Andrew Whitwham, Thomas Keane, Robert M Davies
GigaScience, Volume 10, Issue 2, February 2021, giab007, https://doi.org/10.1093/gigascience/giab007

@article{10.1093/gigascience/giab007,
author = {Bonfield, James K and Marshall, John and Danecek, Petr and Li, Heng and Ohan, Valeriu and Whitwham, Andrew and Keane, Thomas and Davies, Robert M},
title = "{HTSlib: C library for reading/writing high-throughput sequencing data}",
journal = {GigaScience},
volume = {10},
number = {2},
year = {2021},
month = {02},
abstract = "{Since the original publication of the VCF and SAM formats, an explosion of software tools have been created to process these data files. To facilitate this a library was produced out of the original SAMtools implementation, with a focus on performance and robustness. The file formats themselves have become international standards under the jurisdiction of the Global Alliance for Genomics and Health.We present a software library for providing programmatic access to sequencing alignment and variant formats. It was born out of the widely used SAMtools and BCFtools applications. Considerable improvements have been made to the original code plus many new features including newer access protocols, the addition of the CRAM file format, better indexing and iterators, and better use of threading.Since the original Samtools release, performance has been considerably improved, with a BAM read-write loop running 5 times faster and BAM to SAM conversion 13 times faster (both using 16 threads, compared to Samtools 0.1.19). Widespread adoption has seen HTSlib downloaded \\&gt;1 million times from GitHub and conda. The C library has been used directly by an estimated 900 GitHub projects and has been incorporated into Perl, Python, Rust, and R, significantly expanding the number of uses via other languages. HTSlib is open source and is freely available from htslib.org under MIT/BSD license.}",
issn = {2047-217X},
doi = {10.1093/gigascience/giab007},
url = {https://doi.org/10.1093/gigascience/giab007},
note = {giab007},
eprint = {https://academic.oup.com/gigascience/article-pdf/10/2/giab007/36332285/giab007.pdf},
}
26 changes: 26 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,29 @@ make install
```

[download]: http://www.htslib.org/download/

### Citing

Please cite this paper when using HTSlib for your publications.

> HTSlib: C library for reading/writing high-throughput sequencing data </br>
> James K Bonfield, John Marshall, Petr Danecek, Heng Li, Valeriu Ohan, Andrew Whitwham, Thomas Keane, Robert M Davies </br>
> _GigaScience_, Volume 10, Issue 2, February 2021, giab007, https://doi.org/10.1093/gigascience/giab007
```
@article{10.1093/gigascience/giab007,
author = {Bonfield, James K and Marshall, John and Danecek, Petr and Li, Heng and Ohan, Valeriu and Whitwham, Andrew and Keane, Thomas and Davies, Robert M},
title = "{HTSlib: C library for reading/writing high-throughput sequencing data}",
journal = {GigaScience},
volume = {10},
number = {2},
year = {2021},
month = {02},
abstract = "{Since the original publication of the VCF and SAM formats, an explosion of software tools have been created to process these data files. To facilitate this a library was produced out of the original SAMtools implementation, with a focus on performance and robustness. The file formats themselves have become international standards under the jurisdiction of the Global Alliance for Genomics and Health.We present a software library for providing programmatic access to sequencing alignment and variant formats. It was born out of the widely used SAMtools and BCFtools applications. Considerable improvements have been made to the original code plus many new features including newer access protocols, the addition of the CRAM file format, better indexing and iterators, and better use of threading.Since the original Samtools release, performance has been considerably improved, with a BAM read-write loop running 5 times faster and BAM to SAM conversion 13 times faster (both using 16 threads, compared to Samtools 0.1.19). Widespread adoption has seen HTSlib downloaded \\&gt;1 million times from GitHub and conda. The C library has been used directly by an estimated 900 GitHub projects and has been incorporated into Perl, Python, Rust, and R, significantly expanding the number of uses via other languages. HTSlib is open source and is freely available from htslib.org under MIT/BSD license.}",
issn = {2047-217X},
doi = {10.1093/gigascience/giab007},
url = {https://doi.org/10.1093/gigascience/giab007},
note = {giab007},
eprint = {https://academic.oup.com/gigascience/article-pdf/10/2/giab007/36332285/giab007.pdf},
}
```
Loading

0 comments on commit 911cb8e

Please sign in to comment.