You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We discussed during our meeting today some possibilities for support user-defined types. We discussed two options: a bytes[n] type, which would allow users to create arrays whose elements are arbitrarily sized collections of bytes, and more robust support for user-defined types that would be backed by a third-party library like HDF5 or cffi. The bytes[n] solution, while simple, puts the onus on the user to handle portability across platforms with different endianness, padding, or alignment. True cross-platform support for user-defined types will likely require users to declare the layouts of their custom data types.
We settled on exploring a declarative JSON description for user-defined data types. This JSON description must have a one-to-one correspondence with user-defined data types as supported by libraries like HDF5 or cffi. The idea is that implementations would then be free to store the user-defined data type using the mechanism supported by the binary container.
On my system, this struct has a size of 16 bytes. Of course, the padding an alignment of this struct is implementation-defined in C. To compile such a struct, it will need to be declared, and each member will need to have a name. I imagine that in HDF5 this would be provided by the user upon registration of the custom data type.
Just the HDF5 part of this code---written by the user---would look something like the following:
I imagine the process for the user would basically be this:
The user registers a custom HDF5 type (with something like bsp_register_hdf5_type(my_struct_hid, "my_cool_struct")). They input an HDF5 custom type (the hid_t) as well as the new type's name.
Based on the registered type, the backend will create and return to the user a new ID for the bsp_type_t enum, perhaps based on hashing the name.
The user reads in a file that uses the newly defined custom data type. This will return the standard bsp_matrix_t, which will itself contain a values array whose type is equal to the newly created bsp_type_t. The implementation can perhaps check that the registered type matches the file's HDF5 type by looking at the types of the corresponding elements.
The user has their data. When they look at the values array, they can see that its type corresponds to the newly created type for my_cool_struct. They can safely cast its data pointer to a pointer to my_cool_struct.
Open Questions
I still have a few open questions:
Should we name custom data type members in the Binsparse JSON description? e.g., instead of storing an array of strings containing the data types, store a tuple: [("v1", "float"), ("v2", "int32"), ("v3", "uint32"), ("v4", "bint8")].
Should we or could we ever attempt to read in a custom data type without a user-registered custom data type? For example, we could have a function that, given a JSON declaration, creates an HDF5 custom data type hid_t. The big challenge here is picking the offsets, since we would need to have an algorithm for picking offsets. These offsets would need to be reproducible, reliable, and correspond to the user's offsets for this to work. There's a danger of things not working here.
I opted for a list of types, which I think is necessary, since a JSON dict is unordered. However, there might be some tweaks we could make to improve the syntax of custom data types.
The text was updated successfully, but these errors were encountered:
Problem Description
We discussed during our meeting today some possibilities for support user-defined types. We discussed two options: a
bytes[n]
type, which would allow users to create arrays whose elements are arbitrarily sized collections of bytes, and more robust support for user-defined types that would be backed by a third-party library like HDF5 or cffi. Thebytes[n]
solution, while simple, puts the onus on the user to handle portability across platforms with different endianness, padding, or alignment. True cross-platform support for user-defined types will likely require users to declare the layouts of their custom data types.We settled on exploring a declarative JSON description for user-defined data types. This JSON description must have a one-to-one correspondence with user-defined data types as supported by libraries like HDF5 or cffi. The idea is that implementations would then be free to store the user-defined data type using the mechanism supported by the binary container.
Strawman Example
Here we have one custom type.
This would correspond to a C struct with four members of type
float
,int32_t
,uint32_t
, anduint8_t
. The struct might look like this:On my system, this struct has a size of 16 bytes. Of course, the padding an alignment of this struct is implementation-defined in C. To compile such a struct, it will need to be declared, and each member will need to have a name. I imagine that in HDF5 this would be provided by the user upon registration of the custom data type.
Just the HDF5 part of this code---written by the user---would look something like the following:
I imagine the process for the user would basically be this:
bsp_register_hdf5_type(my_struct_hid, "my_cool_struct")
). They input an HDF5 custom type (thehid_t
) as well as the new type's name.bsp_type_t
enum, perhaps based on hashing the name.bsp_matrix_t
, which will itself contain avalues
array whose type is equal to the newly createdbsp_type_t
. The implementation can perhaps check that the registered type matches the file's HDF5 type by looking at the types of the corresponding elements.values
array, they can see that its type corresponds to the newly created type formy_cool_struct
. They can safely cast itsdata
pointer to a pointer tomy_cool_struct
.Open Questions
I still have a few open questions:
[("v1", "float"), ("v2", "int32"), ("v3", "uint32"), ("v4", "bint8")]
.hid_t
. The big challenge here is picking the offsets, since we would need to have an algorithm for picking offsets. These offsets would need to be reproducible, reliable, and correspond to the user's offsets for this to work. There's a danger of things not working here.The text was updated successfully, but these errors were encountered: