Save delayed operations to HDF5 using the chihaya specification.
This extracts operations out of a DelayedArray
and stores them in a HDF5 file,
where they can be used to reconstitute the same DelayedArray
in a new R session - or indeed, in a different analysis framework altogether.
The idea is to save the operations, which is usually cheap;
rather than the results of the operations, which may be expensive for large datasets or when sparsity is broken.
If we make a DelayedArray
with arbitrary operations:
library(DelayedArray)
x <- DelayedArray(matrix(runif(1000), ncol=10))
x <- x[11:15,] / runif(5)
x <- log2(x + 1)
x
## <5 x 10> matrix of class DelayedMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.318228112 1.789374232 1.854133153 . 1.10085064 1.22825033
## [2,] 0.340258109 0.598988926 0.005719794 . 0.05900444 0.19562976
## [3,] 0.205758979 0.624928389 0.574661104 . 0.96990885 0.31573385
## [4,] 0.129171362 1.149253865 0.091821910 . 0.10878614 0.45618400
## [5,] 1.317402933 1.753933055 1.857993438 . 1.83012744 2.11469960
We can save it to file with the chihaya R package:
library(chihaya)
fpath <- tempfile(fileext=".h5")
saveDelayed(x, fpath, "my_delayed_array")
rhdf5::h5ls(fpath)
## group name otype dclass dim
## 0 / my_delayed_array H5I_GROUP
## 1 /my_delayed_array base H5I_DATASET FLOAT ( 0 )
## 2 /my_delayed_array method H5I_DATASET STRING ( 0 )
## 3 /my_delayed_array seed H5I_GROUP
## 4 /my_delayed_array/seed method H5I_DATASET STRING ( 0 )
## 5 /my_delayed_array/seed seed H5I_GROUP
## 6 /my_delayed_array/seed/seed along H5I_DATASET INTEGER ( 0 )
## 7 /my_delayed_array/seed/seed method H5I_DATASET STRING ( 0 )
## 8 /my_delayed_array/seed/seed seed H5I_GROUP
## 9 /my_delayed_array/seed/seed/seed index H5I_GROUP
## 10 /my_delayed_array/seed/seed/seed/index 0 H5I_DATASET INTEGER 5
## 11 /my_delayed_array/seed/seed/seed seed H5I_GROUP
## 12 /my_delayed_array/seed/seed/seed/seed data H5I_DATASET FLOAT 100 x 10
## 13 /my_delayed_array/seed/seed/seed/seed native H5I_DATASET INTEGER ( 0 )
## 14 /my_delayed_array/seed/seed side H5I_DATASET STRING ( 0 )
## 15 /my_delayed_array/seed/seed value H5I_DATASET FLOAT 5
## 16 /my_delayed_array/seed side H5I_DATASET STRING ( 0 )
## 17 /my_delayed_array/seed value H5I_DATASET FLOAT ( 0 )
And then reload it in a separate session:
y <- loadDelayed(fpath, "my_delayed_array")
y
## <5 x 10> matrix of class DelayedMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.318228112 1.789374232 1.854133153 . 1.10085064 1.22825033
## [2,] 0.340258109 0.598988926 0.005719794 . 0.05900444 0.19562976
## [3,] 0.205758979 0.624928389 0.574661104 . 0.96990885 0.31573385
## [4,] 0.129171362 1.149253865 0.091821910 . 0.10878614 0.45618400
## [5,] 1.317402933 1.753933055 1.857993438 . 1.83012744 2.11469960
The file at fpath
follows the specification described here.
This provides cross-language portability and ensures that the serialization process is robust to changes in the DelayedArray class structure.
Many of the basic operations in DelayedArray are supported. However, there are a few operations that are not described by the chihaya specification. An incomplete list is provided below:
is.na
. This is missing as there is no accepted standard definition of missing-ness. (In comparison,is.nan
is well-defined and is supported by the chihaya specification.)- All distribution functions, e.g.,
dpois
,qunif
and so on. These were omitted from the specification as they do not have native implementations in many frameworks.