Reading dataset of compound type containing uint8 #21

kchaliki · 2015-02-04T21:55:10Z

Hello all, question/remark regarding compound types.

If I create a compound type from h5py that looks like say

total size: 13 bytes

timestamp: NATIVE_UINT64
market: NATIVE_UINT8
price: NATIVE_FLOAT

and create a dataset of this type and write a few values out (from a numpy recarray), I think/expect those will be saved to disk in the packed way described in the datatype so they basically use 13 bytes each.

If I then try and read this back into go using go-hdf5 I can declare a struct that looks like

type tick struct {
timestamp uint64
market uint8
price float32
}

which generally will not be 13 bytes but a few more, let's say 16 because of alignment of the members. So if I am going to read back into such a struct, I need to create a compound datatype of the same offsets/sizes so hdf5 knows how to map the values from disk to memory.

However, the API retrieves the datatype of the on-file dataset and passes that into the read call along with the memory address of the beginning of my slice of structs. This ends up mapping the values incorrectly.

I can modify the h5py dataset creation side to use say all 4 or 8 byte datatypes and using similar types on the go side the read will work again but only by chance.

Is my understanding wrong or does the API need some refining?

Thanks!!

sbinet · 2015-02-06T14:43:01Z

(apologies for the belated answer)

go-hdf5 should definitely handle this better.

could you attach a little h5py test script as a reproducer?

kchaliki · 2015-02-06T18:57:41Z

Hello - thanks for coming back, here is a repro, the python script that creates the hdf5 file and golang that reads it. Obviously you need to rename the file if not on windows.

python

import h5py
import pandas as pd

data = [ 
    {
        'timestamp': 1234567890,
        'market': 1,
        'price': 1.45
    },
    {
        'timestamp': 1234567891,
        'market': 2,
        'price': 1.55
    },
]

h5_file = h5py.File('C:\\temp\\repro.h5', 'w')

type = h5py.h5t.create(h5py.h5t.COMPOUND, 13)
type.insert('timestamp', 0, h5py.h5t.NATIVE_UINT64)
type.insert('market', 8, h5py.h5t.NATIVE_UINT8)
type.insert('price', 9, h5py.h5t.NATIVE_FLOAT)

records = pd.DataFrame(data).to_records()
_dataset = h5_file.create_dataset('foo', None, type, records)

golang

package main

import (
    "fmt"
    "github.com/sbinet/go-hdf5"
    "unsafe"
)

type tick struct {
    timestamp uint64
    market    uint8
    price     float32
}

func main() {
    fname := "C:/temp/repro.h5"
    f, err := hdf5.OpenFile(fname, hdf5.F_ACC_RDONLY)
    if err != nil {
        fmt.Printf("Could not open data file %s", fname)
        panic(err)
    }

    dataset, err := f.OpenDataset("foo")

    numTicks := dataset.Space().SimpleExtentNPoints()
    fmt.Printf("Reading %d ticks into struct of size %d", numTicks, unsafe.Sizeof(tick{}))
    ticks := make([]tick, numTicks)
    err = dataset.Read(&ticks)
    if err != nil {
        panic(err)
    }

    // display the fields
    fmt.Printf("%+v\n", ticks)
}

sbinet self-assigned this Feb 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading dataset of compound type containing uint8 #21

Reading dataset of compound type containing uint8 #21

kchaliki commented Feb 4, 2015

sbinet commented Feb 6, 2015

kchaliki commented Feb 6, 2015

Reading dataset of compound type containing uint8 #21

Reading dataset of compound type containing uint8 #21

Comments

kchaliki commented Feb 4, 2015

sbinet commented Feb 6, 2015

kchaliki commented Feb 6, 2015