You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My program wants to compress some large cached strings and decompress them later. I have no particular requirements on the form of the compressed data, so I used ZstdCompressionChunker to do the compression to avoid repeated reallocation of the output buffer. I would like to process the decompressed data in chunks to reduce peak memory usage. However there is no obvious efficient way to decompress chunks to chunks:
The ZstdCompressionChunker round-trip tests all concatenate the chunks with bytes.join for one-shot decompression. (Fine, they're tests.)
I tried chain.from_iterable(dctx.read_to_iter(c) for c in chunks). This doesn't work because each read_to_iter iterator expects to process a full stream. (I expected it to hold state in the ZstdDecompressor it was obtained from.)
ZstdCompressionObj's documentation says it isn't efficient:
Because calls to decompress() may need to perform multiple memory (re)allocations, this streaming decompression API isn’t as efficient as other APIs.
read_to_iter's documentation says
read_to_iter() accepts an object with a read(size) method that will return compressed bytes or an object conforming to the buffer protocol.
so I wrote a class with a read method that returns memoryviews over the chunks (to avoid copying slices). The documentation is grammatically ambiguous; it turns out that read_to_iter segfaults (!) when given an object with a read method that returns an object conforming to the buffer protocol that is not exactly bytes (reduced test case below).
My feature request is to provide an efficient way to decompress a sequence of chunks compressed with ZstdCompressionChunker (or to document an existing method as the efficient way, if there is one).
import zstandard as zstd
b = b'AB' * 1000
d = zstd.compress(b)
assert zstd.decompress(memoryview(d)) == b # passes
class Whatever:
def __init__(self, data):
self.data = data
def read(self, size):
assert len(data) <= size
return memoryview(self.data)
dctx = zstd.ZstdDecompressor()
assert b''.join(dctx.read_to_iter(Whatever(d))) == b # segfault
Segfaults using Arch Linux's python 3.13.2-1 and python-zstandard 0.23.0-2.
The text was updated successfully, but these errors were encountered:
My program wants to compress some large cached strings and decompress them later. I have no particular requirements on the form of the compressed data, so I used ZstdCompressionChunker to do the compression to avoid repeated reallocation of the output buffer. I would like to process the decompressed data in chunks to reduce peak memory usage. However there is no obvious efficient way to decompress chunks to chunks:
The ZstdCompressionChunker round-trip tests all concatenate the chunks with
bytes.join
for one-shot decompression. (Fine, they're tests.)I tried
chain.from_iterable(dctx.read_to_iter(c) for c in chunks)
. This doesn't work because eachread_to_iter
iterator expects to process a full stream. (I expected it to hold state in the ZstdDecompressor it was obtained from.)ZstdCompressionObj's documentation says it isn't efficient:
read_to_iter
's documentation saysso I wrote a class with a read method that returns memoryviews over the chunks (to avoid copying slices). The documentation is grammatically ambiguous; it turns out that
read_to_iter
segfaults (!) when given an object with a read method that returns an object conforming to the buffer protocol that is not exactlybytes
(reduced test case below).My feature request is to provide an efficient way to decompress a sequence of chunks compressed with ZstdCompressionChunker (or to document an existing method as the efficient way, if there is one).
Segfaults using Arch Linux's python 3.13.2-1 and python-zstandard 0.23.0-2.
The text was updated successfully, but these errors were encountered: