Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] iadd does not do in-place add #1142

Open
suranap opened this issue Jun 20, 2024 · 1 comment
Open

[BUG] iadd does not do in-place add #1142

suranap opened this issue Jun 20, 2024 · 1 comment

Comments

@suranap
Copy link

suranap commented Jun 20, 2024

Software versions

On Sapling2 at Stanford

$ legate-issue
Python : 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
Platform : Linux-5.4.0-169-generic-x86_64-with-glibc2.31
Legion : legion-24.03.0
Legate : 24.01.00.dev+38.g90944d7
Cunumeric : 24.01.00.dev+32.g364e95dc.dirty
Numpy : 1.26.4
Scipy : 1.13.1
Numba : 0.59.1
CTK package : cuda-version-11.7-h67201e3_3 (conda-forge)
GPU driver : 535.54.03
GPU devices :
GPU 0: Tesla P100-SXM2-16GB
GPU 1: Tesla P100-SXM2-16GB
GPU 2: Tesla P100-SXM2-16GB
GPU 3: Tesla P100-SXM2-16GB

Jupyter notebook / Jupyter Lab version

No response

Expected behavior

Both iadd and add(x,y,out=x) should reuse the space occupied by a and not make a new copy. In the stack trace below, it appears the code is doing this:

t = copy(a)
t = task(a + b)

Observed behavior

This code gets to DeferredArray.binary_op. Here's a code snippet:

        lhs = self.base
        src1 = src1._copy_if_overlapping(self).  # Is this line right? 
        rhs1 = src1._broadcast(lhs.shape)
        src2 = src2._copy_if_overlapping(self)
        rhs2 = src2._broadcast(lhs.shape)

The 2nd line compares a to a, confirms they are overlapping and copies the array into a new thunk.

Example code or instructions

import cunumeric as np
a = np.random.randint(0, 1000, size=(50257, 768))
b = np.random.randint(0, 1000, size=(50257, 768))
a += b

Stack traceback or browser console output

>>> pdb.run("np.add(a, b, out=a)", globals=globals())
> <string>(1)<module>()
(Pdb) c
> ~/projects/legate/cunumeric/cunumeric/deferred.py(119)wrapper()
-> self = args[0]
(Pdb) n
> ~/projects/legate/cunumeric/cunumeric/deferred.py(120)wrapper()
-> args = tuple(
(Pdb)
> ~/projects/legate/cunumeric/cunumeric/deferred.py(124)wrapper()
-> for (idx, arg) in enumerate(args)
(Pdb)
> ~/projects/legate/cunumeric/cunumeric/deferred.py(120)wrapper()
-> args = tuple(
(Pdb)
> ~/projects/legate/cunumeric/cunumeric/deferred.py(126)wrapper()
-> for k, v in kwargs.items():
(Pdb)
> ~/projects/legate/cunumeric/cunumeric/deferred.py(130)wrapper()
-> return func(*args, **kwargs)
(Pdb) p args
(<cunumeric.deferred.DeferredArray object at 0x7f94556d4820>, <BinaryOpCode.ADD: 1>, <cunumeric.deferred.DeferredArray object at 0x7f94556d4820>, <cunumeric.deferred.DeferredArray object at 0x7f94556d5750>, True, ())
(Pdb) s
--Call--
> ~/projects/legate/cunumeric/cunumeric/deferred.py(3301)binary_op()
-> @auto_convert("src1", "src2")
(Pdb) n
> ~/projects/legate/cunumeric/cunumeric/deferred.py(3310)binary_op()
-> lhs = self.base
(Pdb)
> ~/projects/legate/cunumeric/cunumeric/deferred.py(3311)binary_op()
-> src1 = src1._copy_if_overlapping(self)
(Pdb) s
--Call--
> ~/projects/legate/cunumeric/cunumeric/deferred.py(291)_copy_if_overlapping()
-> def _copy_if_overlapping(self, other: DeferredArray) -> DeferredArray:
(Pdb) n
> ~/projects/legate/cunumeric/cunumeric/deferred.py(292)_copy_if_overlapping()
-> if not self.base.overlaps(other.base):
(Pdb) self.base.overlaps(other.base)
True
(Pdb) n
> ~/projects/legate/cunumeric/cunumeric/deferred.py(294)_copy_if_overlapping()
-> copy = cast(
(Pdb)
> ~/projects/legate/cunumeric/cunumeric/deferred.py(295)_copy_if_overlapping()
-> DeferredArray,
(Pdb)
> ~/projects/legate/cunumeric/cunumeric/deferred.py(296)_copy_if_overlapping()
-> self.runtime.create_empty_thunk(
(Pdb)
> ~/projects/legate/cunumeric/cunumeric/deferred.py(297)_copy_if_overlapping()
-> self.shape,
(Pdb)
> ~/projects/legate/cunumeric/cunumeric/deferred.py(298)_copy_if_overlapping()
-> self.base.type,
(Pdb)
> ~/projects/legate/cunumeric/cunumeric/deferred.py(299)_copy_if_overlapping()
-> inputs=[self],
(Pdb)
> ~/projects/legate/cunumeric/cunumeric/deferred.py(296)_copy_if_overlapping()
-> self.runtime.create_empty_thunk(
(Pdb)
> ~/projects/legate/cunumeric/cunumeric/deferred.py(294)_copy_if_overlapping()
-> copy = cast(
(Pdb)
> ~/projects/legate/cunumeric/cunumeric/deferred.py(302)_copy_if_overlapping()
-> copy.copy(self, deep=True)
(Pdb)
> ~/projects/legate/cunumeric/cunumeric/deferred.py(119)wrapper()
-> self = args[0]
(Pdb)

It's the create_empty_thunk that looks strange to me. And the copy.copy is just more unneeded work.

@manopapad
Copy link
Contributor

I was able to reproduce this issue, and @bryevdv will be investigating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants