-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set name of UDT when serializing #146
Comments
The reference to GxB_serialize_free is a mistake. It should just read "free(blob)", or more precisely, it should be freed by the same free method passed to GxB_Init. I'll fix the comment, and explain in more detail in the user guide. Changing the name of a user-defined type would be fine for now, but it would destabilize a future version of GraphBLAS. I will be using that name to create GPU and CPU kernels with a jit, and if the name changes, the jitified kernel will break. So GxB_Type_rename is not a good idea. |
Understood, thanks. There are a few reasons why being able to set the name of the UDT when serializing would be nice for us:
The last one can come up in In the future, will UDT names need to be valid identifiers? I sure hope not. What would be the cost of changing the name in the future with JIT? Would things need to be rejitted? Why not give it an internal unique identifier for such a purpose (would |
The user-defined type names should be a valid identifier that can be used as in the string "wildtype" for this:
For that type, the name should be the string 'wildtype'. I need a valid identifier to name the type if I am to create a valid kernel that uses the type internally. So the type can't include any spaces, and can't start with a digit, for example. No semicolons, dots, or whatever. Constructing my own internal typedef name would be really hard. I need a consistent name to communicate with the user, to return to GxB_Matrix_type for example, and to manage types in JIT'ed kernels and such. |
I see. My bad again for not reading the docs thoroughly. I guess a more general goal would be to allow users to include extra metadata (perhaps as a string) when serializing an object, and to be able to retrieve this extra metadata when deserializing. This would give me a place--as part of your file format--to include a description of the datatype, which can be used to create the new type in Python if necessary. |
Requiring the name to be a valid identifier really is kind of a pain. In Python, making new UDTs is as easy as: # Simple struct
>>> gb.dtypes.lookup_dtype({'x': int, 'y': float}).name
"{'x': INT64, 'y': FP64}"
# More complicated (datetime, fixed-length array, and fixed-length string)
>>> gb.dtypes.lookup_dtype({'x': np.datetime64, 'y': np.dtype((int, 3)), 'z': 'S5'}).name
"{'x': dtype('<M8'), 'y': INT64[3], 'z': dtype('S5')}" This shows default names that we use. Being flexible with dtypes allows us to easily interoperate with other libraries in the Python ecosystem. It would be a burden to require users to give a name that's a valid identifier. If we are to use the JIT with UDTs in a future version of SuiteSparse:GraphBLAS, I think we'll need to generate our own automatic names to give to SuiteSparse, such as |
I'll keep this in mind. The first place where I'll introduce a JIT is for the CUDA kernels, for built-in types, and then user-defined types after that. The typedef, as an entire string, and a valid type identifier, will both be required:
where the "typedef" and trailing identifier "wildtype ;" might not be strictly required. I haven't written this yet but that is my plan. I will need both strings: the typedef (as a proper C typedef not a python typedef), and a valid identifier, so that I can inject them into a file, in C / C++, which I can then compile as a CUDA kernel or as a C-based OpenMP kernel. |
Thanks for the consideration. I'm looking forward to a JIT, although I'm a little paranoid about struct alignment. imho, it may be reasonable to distinguish between a name, which is flexible, and an identity, which must be a valid identifier. The former is user-facing, the latter compiler-facing. The name can be the identifier by default. For example, if one is naming an array dtype, it would be natural to name it something like |
That might be feasible. Currently, my GxB_Type_new method has 2 parameters: the type name (a valid identified) and the type definition (a string with the typedef in C syntax:
Internally, I hold this object as follows:
The Type->defn field is currently unused, but I'll use in the future GPU and CPU JITs. The Type->name is fixed in size, to 128 characters max (including the NULL terminator). The Type->name field is returned to the caller via GxB_Type_name, and it will also be used in the JITs. The Type->defn is unbounded in size (I malloc it and keep a copy). I do not yet have a method for returning the Type->defn but I will add that once I start using it. Perhaps GxB_Type_defn could be a method that returns the full definition of the type, as a string. There are 2 possible solutions for your case. One approach would be to include a comment in the Type->defn, so the string passed in could be
That would solve 2 problems: the same Type->defn could be parsed (by the python interface) to extract the user-preferred name (which need not be an identifier), and I can still use this Type->defn in my JIT'd kernels. It would not change my API. Would that work? You could even include more python-specific details in the C comments of the typedef string. There would be no limit to its size, since I just scan the string for its string length, and malloc accordingly. My JIT will just treat this string as a black-box, by just handing it over to the compiler after inserting it into the JIT'd C or CUDA kernel. You could even pass in the following as a single string for my defn parameter to GxB_Type_new, with the exact python definition given to me as a C comment:
The other approach would be to add another parameter to GxB_Type_new, so I would have a type name (any string) a type identifier (a valid C identifier) and a type definition (a string with a valid C typedef). That would be an API-breaking change, however, since I already have GxB_Type_new defined in its current form, for my future JIT. |
Regarding struct alignment: I will simply be trusting the C compiler to do whatever it normally does for a struct. I can't write a compiler to parse the typedef so I have to trust the C compiler to define it, and also to compile the C functions given to GxB_BinaryOp_new, etc. |
Thanks again for your time. Yeah, both of those options probably work for me. Does or will your serialization include the If users don't give me a valid identifier to name a UDT, I'll just give it one myself. |
My serialization does not yet include the type_defn string, but I could add it. The blob does include a version number. I currently don't check it since any of my serialized blobs written by any of my GrB library versions can be read/written by any other GrB library version. If I were to add the type_defn string, I could do so in an upper-ward compatible way, where (say) SS:GraphBLAS v7.2 would write the blob with the type_defn included. That blob could be read by any of my SS:GraphBLAS versions; they would just ignore the added type_defn. But v7.2.0 could then read back the type_defn. I do include the type identifier, as an uncompressed string. |
The GrB_get/set method can set an arbitrary string as the GrB_NAME of a type. It can be the entire type definition if you like. I don't need the name for the JIT, just for get/set. Will this resolve the issue? I've added both the matrix name and the GrB_ELTYPE_STRING (the GrB_NAME of the type of the matrix) to the serialized blob, in the draft v8.1.0. |
When using
GxB_*_serialize
, I would like the option to set the name of a user-defined type to be used in the output blob. This is for maximum portability.A sufficient solution for me would be to have the ability to change the name of a UDT after it's been created. Maybe add
GxB_Type_rename
? This doesn't require changing the signature of the serialize functions,p.s. What is
GxB_serialize_free
mentioned here?GraphBLAS/Include/GraphBLAS.h
Line 11281 in 1bf1dde
The text was updated successfully, but these errors were encountered: