You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to run the following straigthforward-looking blaze code:
In [6]: s = bz.symbol('s', 'var * int')
In [7]: bz.compute(s + s, {s: arange(5)})
this results in a big scary traceback terminated in the blaze numba backend with:
TypeError: ufunc '<lambda>' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Fortunately for me, I sit across from @llllllllll at work, and he informed me that int means int32 in datashape, which triggers this error on the numba backend because I'm attempting to compute an expression of type int32 against data of type int64, which numba rightfully considers unsafe. (It'd be nice if numba told me this information, but that's a separate issue.)
Looking through type_symbol_table.py, the interpretation of int is just hard-coded to int32. Interestingly, intptr, is interpreted as "the size of the system int":
Always interpreting int as int32 seems incorrect to me, given the fact that np.arange(N, dtype=int) returns int64s on 64-bit machines. There are, I think, two reasonable alternatives:
Make int mean "system int", i.e., int means int64 on 64-bit machines, and int32 on 32-bit machines.
Disallow int entirely in datashape strings in favor of explicitly requiring a size.
While option 1 may seem initially appealing, I'd argue that in the long run it would lead to subtle bugs as people write code assuming that int is 32 or 64-bit, only to encounter failures on other machines. (We've encountered such issues in zipline.)
I'd argue that option 2 is the better solution in the long run. Many datashape users will initially stumble when var * int is rejected, but if the parser is made to fail with a clean error indicating that the user should specify int32 or int64, I don't think many people will struggle to adapt their code accordingly.
Additional evidence in favor of deprecating int is the fact that float and uint always require explicit size modifiers (though, interestingly, real and complex have entries).
The text was updated successfully, but these errors were encountered:
I am +1 on killing the defaults. This causes issues in numpy for our 32bit versions. The big issue also is that this means that in odo: resource(some_table, dshape'var * {a: int}') will actually make a different sqltype depending on the bitwidth of the client.
ssanderson
changed the title
Consider removing default intepretation of int as int32
Consider removing default interpretation of int as int32Nov 24, 2015
I tried to run the following straigthforward-looking blaze code:
this results in a big scary traceback terminated in the blaze numba backend with:
Fortunately for me, I sit across from @llllllllll at work, and he informed me that
int
meansint32
in datashape, which triggers this error on the numba backend because I'm attempting to compute an expression of typeint32
against data of typeint64
, which numba rightfully considers unsafe. (It'd be nice if numba told me this information, but that's a separate issue.)Looking through
type_symbol_table.py
, the interpretation ofint
is just hard-coded toint32
. Interestingly,intptr
, is interpreted as "the size of the system int":Always interpreting
int
asint32
seems incorrect to me, given the fact thatnp.arange(N, dtype=int)
returnsint64
s on 64-bit machines. There are, I think, two reasonable alternatives:int
mean "system int", i.e.,int
meansint64
on 64-bit machines, andint32
on 32-bit machines.int
entirely in datashape strings in favor of explicitly requiring a size.While option 1 may seem initially appealing, I'd argue that in the long run it would lead to subtle bugs as people write code assuming that int is 32 or 64-bit, only to encounter failures on other machines. (We've encountered such issues in zipline.)
I'd argue that option 2 is the better solution in the long run. Many datashape users will initially stumble when
var * int
is rejected, but if the parser is made to fail with a clean error indicating that the user should specifyint32
orint64
, I don't think many people will struggle to adapt their code accordingly.Additional evidence in favor of deprecating
int
is the fact thatfloat
anduint
always require explicit size modifiers (though, interestingly,real
andcomplex
have entries).The text was updated successfully, but these errors were encountered: