Skip to content

Commit

Permalink
asarray: Add support for NumPy scalars (pytorch#90914)
Browse files Browse the repository at this point in the history
Follow up from: Quansight-Labs/numpy_pytorch_interop#3

This PR adds support for NumPy scalars for `torch.asarray`.

**Before:** treats the scalar as an object that implements the buffer protocol. Thus, interprets the data as the default data type (`float32`)

```python
>>> torch.asarray(numpy.float64(0.5))
tensor([0.0000, 1.7500])
```

**After:** identifies the NumPy scalar, and does the "right" thing. i.e. creates a 0-dimensional tensor from the NumPy array that doesn't share its memory

```python
>>> torch.asarray(numpy.float64(0.5))
tensor(0.5000, dtype=torch.float64)
```
Pull Request resolved: pytorch#90914
Approved by: https://github.com/lezcano, https://github.com/mruberry
  • Loading branch information
ysiraichi authored and pytorchmergebot committed Jan 24, 2023
1 parent cc4fbd1 commit 3f64c96
Show file tree
Hide file tree
Showing 3 changed files with 56 additions and 7 deletions.
12 changes: 12 additions & 0 deletions test/test_tensor_creation_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -3936,6 +3936,18 @@ def test_astensor_consistency(self, device):
t = torch.asarray(e)
self.assertEqual(t, original)

@onlyCPU
def test_numpy_scalars(self, device):
scalar = np.float64(0.5)

with self.assertRaisesRegex(RuntimeError, "can't alias NumPy scalars."):
torch.asarray(scalar, copy=False)

tensor = torch.asarray(scalar)
self.assertEqual(tensor.dim(), 0)
self.assertEqual(tensor.item(), scalar.item())
self.assertEqual(tensor.dtype, torch.float64)

instantiate_device_type_tests(TestTensorCreation, globals())
instantiate_device_type_tests(TestRandomTensorCreation, globals())
instantiate_device_type_tests(TestLikeTensorCreation, globals())
Expand Down
14 changes: 11 additions & 3 deletions torch/_torch_docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -1230,7 +1230,7 @@ def merge_dicts(*dicts):
:attr:`obj` can be one of:
1. a tensor
2. a NumPy array
2. a NumPy array or a NumPy scalar
3. a DLPack capsule
4. an object that implements Python's buffer protocol
5. a scalar
Expand All @@ -1245,14 +1245,18 @@ def merge_dicts(*dicts):
is ``True`` then the returned tensor will require a gradient, and if :attr:`obj` is
also a tensor with an autograd history then the returned tensor will have the same history.
When :attr:`obj` is not a tensor, NumPy Array, or DLPack capsule but implements Python's
When :attr:`obj` is not a tensor, NumPy array, or DLPack capsule but implements Python's
buffer protocol then the buffer is interpreted as an array of bytes grouped according to
the size of the datatype passed to the :attr:`dtype` keyword argument. (If no datatype is
passed then the default floating point datatype is used, instead.) The returned tensor
will have the specified datatype (or default floating point datatype if none is specified)
and, by default, be on the CPU device and share memory with the buffer.
When :attr:`obj` is none of the above but a scalar or sequence of scalars then the
When :attr:`obj` is a NumPy scalar, the returned tensor will be a 0-dimensional tensor on
the CPU and that doesn't share its memory (i.e. ``copy=True``). By default datatype will
be the PyTorch datatype corresponding to the NumPy's scalar's datatype.
When :attr:`obj` is none of the above but a scalar, or a sequence of scalars then the
returned tensor will, by default, infer its datatype from the scalar values, be on the
CPU device, and not share its memory.
Expand Down Expand Up @@ -1320,6 +1324,10 @@ def merge_dicts(*dicts):
>>> t2 = torch.asarray(array, dtype=torch.float32)
>>> array.__array_interface__['data'][0] == t1.data_ptr()
False
>>> scalar = numpy.float64(0.5)
>>> torch.asarray(scalar)
tensor(0.5000, dtype=torch.float64)
""",
)

Expand Down
37 changes: 33 additions & 4 deletions torch/csrc/utils/tensor_new.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1603,10 +1603,39 @@ Tensor asarray(
}

#ifdef USE_NUMPY
// Check whether 'obj' is a NumPy Array
if (is_numpy_available() && PyArray_Check(obj)) {
tensor = tensor_from_numpy(obj, /*warn_if_not_writeable=*/false);
should_warn_numpy_not_writable = !PyArray_ISWRITEABLE((PyArrayObject*)obj);
if (is_numpy_available()) {
// Check whether 'obj' is a NumPy Array or Scalar.
bool is_numpy_array = PyArray_Check(obj);
bool is_numpy_scalar = PyArray_CheckScalar(obj);

if (is_numpy_array || is_numpy_scalar) {
THPObjectPtr ptr;
auto arr = obj;

if (is_numpy_scalar) {
TORCH_CHECK(
!force_alias,
"can't alias NumPy scalars. ",
"Either remove copy=False or transform it in a ndarray. ")

ptr = PyArray_FromScalar(obj, nullptr);
arr = ptr.get();
}

tensor = tensor_from_numpy(arr, /*warn_if_not_writeable=*/false);
should_warn_numpy_not_writable =
!PyArray_ISWRITEABLE((PyArrayObject*)arr);

if (is_numpy_scalar) {
// Uses a newly cloned storage, instead of the shared one.
// The THPObjectPtr will delete the previous storage in the
// end of the previous scope.
tensor = tensor.clone();

// No need to clone again, later.
force_copy = false;
}
}
}
#endif

Expand Down

0 comments on commit 3f64c96

Please sign in to comment.