The dolomite-matrix package is the Python counterpart to the alabaster.matrix R package, providing methods for saving/reading arrays and matrices within the dolomite framework. Dense arrays are stored in the usual HDF5 dataset, while sparse matrices are saved inside a HDF5 file in compressed sparse format.
Let's save a dense matrix to a HDF5 file with some accompanying metadata:
import numpy
x = numpy.random.rand(1000, 200)
import os
import tempfile
dir = os.path.join(tempfile.mkdtemp(), "whee")
import dolomite_base
import dolomite_matrix
dolomite_base.save_object(x, dir)
Now we can transfer the directory and reload the matrix in a new session. This constructs a HDF5-backed dense array that can be used for block processing or realized into the usual NumPy array.
import dolomite_base
obj = dolomite_base.read_object(dir)
## <1000 x 200> ReloadedArray object of type 'float64'
## [[0.58444226, 0.82595149, 0.7214525 , ..., 0.32493652, 0.58206044,
## 0.73770346],
## [0.96398317, 0.73200292, 0.16410134, ..., 0.31626547, 0.11499628,
## 0.19768697],
## [0.82350911, 0.48012452, 0.65221052, ..., 0.94989611, 0.15422992,
## 0.77173718],
## ...,
## [0.71715436, 0.19266116, 0.52316388, ..., 0.23104537, 0.935654 ,
## 0.51663007],
## [0.38585049, 0.26709808, 0.70358993, ..., 0.91822795, 0.66144925,
## 0.42465112],
## [0.08535589, 0.00144712, 0.51411921, ..., 0.84546122, 0.35001404,
## 0.53644868]]
We can also save and load a sparse matrix from a HDF5 file:
import scipy
import numpy
x = scipy.sparse.random(1000, 200, 0.2, dtype=numpy.int16, format="csc")
import os
import tempfile
dir = os.path.join(tempfile.mkdtemp(), "stuff")
import dolomite_base
import dolomite_matrix
dolomite_base.save_object(x, dir)
And again, loading it back in a new session. This constructs a HDF5-backed sparse array that can be used for block processing or realized into the usual NumPy array.
import dolomite_base
obj = dolomite_base.read_object(dir)
## <1000 x 200> sparse ReloadedArray object of type 'int16'
## [[ 0, 0, -28638, ..., 0, 0, 26194],
## [ 0, 0, 0, ..., 0, -30829, 0],
## [ 0, 0, 0, ..., 0, 0, 0],
## ...,
## [ 10895, 0, 0, ..., 0, 0, 0],
## [ 0, 32539, 0, ..., 0, 2780, -12106],
## [ 0, 0, 0, ..., 1452, 0, -26314]]