Compression for photon-noise limited images which keeps losses within the Poisson noise envelope
PYMECompress consists of four parts:
-
a fork of the Basic Compression Library originally by Marcus Geelnard, modified to include a heavily optimized huffman coder (BCL license is avalable under pymecompress/bcl/doc/manual.pdf and would appear to be BSD compatible)
-
a fast, AVX optimized, quantizer to perform "within noise level" quantization of photon-limited images
-
a python wrapper of the above. Note that at this point, only huffman coding and quantization are exposed to python
-
numcodecs codecs (experimental) to permit simple usage with other IO packages - e.g. Zarr
Together they offer a single core throughput of ~500 -600MB/s
Prebuilt binaries of PYMEcompress are available as a conda package (pymecompress) on the david_baddeley conda channel for python 2.7, 3.6 & 3.7, and installable via:
conda install -c david_baddeley pymecompress
If you want to modify/contribute to the package you will have to build from source.
Because we use gcc compiler extensions for avx opcodes, we must use gcc/clang for compilation, regardless of platform.
On OSX / linux, a standard python setup.py install
or python setup.py develop
should work.
On Windows, you need to install mingw and run the build step first so that you can pass the compiler flag to python setup.py build
- i.e. :
python setup.py build --compiler=mingw32
python setup.py install
A suitable environment for building pymecompress can be created using the following conda command conda create -n <name> python=x.x numpy cython libpython m2w64-toolchain
Installation via pip is also available:
pip install pymecompress
although binary wheels are not available for all platforms so you may need to set up a build environment (gcc/mingw, as described for source installation) first.
import numpy as np
from pymecompress import codecs
# vanilla huffman coding (lossless). NB - input buffer must be bytes/uint8
huff = codecs.Huffman()
d = np.ones(1000)
assert np.allclose(huff.decode(huff.encode(d.view('uint8'))).view(d.dtype), d)
# with quantisation NB: input data type MUST be uint16
huffq = codecs.HuffmanQuant16(offset=0, scale=1.0)
ds = np.linspace(1,2**15).astype('uint16')
assert np.all((huffq.decode(huffq.encode(ds)) - ds.astype('f')) < np.sqrt(ds))
VERY EXPERIMENTAL!
This is not yet well tested, but should work as described in https://zarr.readthedocs.io/en/stable/api/codecs.html. In brief ...
import zarr
from pymecompress import codecs
z = zarr.zeros(1000000, dtype='uint16', compressor=codecs.HuffmanQuant16(offset=0, scale=1))
NB To be able to read/open files saved using the pymecompress codecs you will probably need to run from pymecompress import codecs
to register the codecs with numcodecs
before trying to open the file.
As you need to supply the original size to the decompression function, these are most suitable when putting the compressed data in an external wrapper e.g. PYMEs' PZFFormat which keeps track of the original data dimensions and dtype (we save a couple of bytes and a seek over using the codec versions above).
import numpy as np
import pymecompress
data = np.linspace(1,2**15).astype('uint16')
nbytes = data.nbytes
c = pymecompress.HuffmanCompressQuant(data, quantizationOffset, quantizationScale).to_string()
decompressed = pymecompress.HuffmanDecompress(np.fromstring(c, 'u1'), nbytes)
dequantized = (quantisationScale*decompressed)**2 + quantizationOffset