Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider removing default interpretation of int as int32 #196

Open
ssanderson opened this issue Nov 24, 2015 · 1 comment
Open

Consider removing default interpretation of int as int32 #196

ssanderson opened this issue Nov 24, 2015 · 1 comment

Comments

@ssanderson
Copy link

I tried to run the following straigthforward-looking blaze code:

In [6]: s = bz.symbol('s', 'var * int')
In [7]: bz.compute(s + s, {s: arange(5)})

this results in a big scary traceback terminated in the blaze numba backend with:

TypeError: ufunc '<lambda>' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Fortunately for me, I sit across from @llllllllll at work, and he informed me that int means int32 in datashape, which triggers this error on the numba backend because I'm attempting to compute an expression of type int32 against data of type int64, which numba rightfully considers unsafe. (It'd be nice if numba told me this information, but that's a separate issue.)

Looking through type_symbol_table.py, the interpretation of int is just hard-coded to int32. Interestingly, intptr, is interpreted as "the size of the system int":

no_constructor_types = [
    ...
    ('int32', ct.int32),
    ('int64', ct.int64),
    ('intptr', ct.int64 if _is_64bit else ct.int32),
    ('int', ct.int32),
    ...

Always interpreting int as int32 seems incorrect to me, given the fact that np.arange(N, dtype=int) returns int64s on 64-bit machines. There are, I think, two reasonable alternatives:

  1. Make int mean "system int", i.e., int means int64 on 64-bit machines, and int32 on 32-bit machines.
  2. Disallow int entirely in datashape strings in favor of explicitly requiring a size.

While option 1 may seem initially appealing, I'd argue that in the long run it would lead to subtle bugs as people write code assuming that int is 32 or 64-bit, only to encounter failures on other machines. (We've encountered such issues in zipline.)

I'd argue that option 2 is the better solution in the long run. Many datashape users will initially stumble when var * int is rejected, but if the parser is made to fail with a clean error indicating that the user should specify int32 or int64, I don't think many people will struggle to adapt their code accordingly.

Additional evidence in favor of deprecating int is the fact that float and uint always require explicit size modifiers (though, interestingly, real and complex have entries).

@llllllllll
Copy link
Member

I am +1 on killing the defaults. This causes issues in numpy for our 32bit versions. The big issue also is that this means that in odo: resource(some_table, dshape'var * {a: int}') will actually make a different sqltype depending on the bitwidth of the client.

@ssanderson ssanderson changed the title Consider removing default intepretation of int as int32 Consider removing default interpretation of int as int32 Nov 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants