Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rapidly degrading adjoint test in 3D #124

Open
grizzuti opened this issue Jan 14, 2022 · 4 comments
Open

Rapidly degrading adjoint test in 3D #124

grizzuti opened this issue Jan 14, 2022 · 4 comments

Comments

@grizzuti
Copy link

grizzuti commented Jan 14, 2022

Dear all,

I was testing cufinufft via the Python wrapper on some 3d problems, and I stumbled upon some accuracy issues when performing the adjoint test. That is, given the NFFT linear operator F,

dot(F*x,y) \approx dot(x,F'*y).

The adjoint test passes for very small-sized problems (32,32,32) (rel err around 1e-7 on my machine) but it gets quickly unacceptable for size (128,128,128) (rel err around 0.5, worst case). You can find below the code I was running.

Please let me know if this is somehow expected, or if it can be mitigated somehow.

Regards,
Gabrio

# Module import
import numpy as np
import pycuda.autoinit
from pycuda.gpuarray import GPUArray, to_gpu
from cufinufft import cufinufft

# NFFT eval
def cufinufft3d_eval(u, kx, ky, kz, eps=1e-6):
     
     # Allocate memory on the GPU for output variable
     n_transf = 1
     U_gpu = GPUArray((n_transf, len(kx)), dtype=u.dtype)
     
     # Initialize the plan and set the points
     plan = cufinufft(2, u.shape, n_transf, eps=eps, dtype=kx.dtype)
     plan.set_pts(to_gpu(kx), to_gpu(ky), to_gpu(kz))
     
     # Execute the plan, reading from the array u and storing the result in U_gpu
     plan.execute(U_gpu, to_gpu(u))
     
     # Retreive the result from the GPU
     return U_gpu.get()[0,:]

def cufinufft3d_adjeval(U, kx, ky, kz, shape, eps=1e-6):
     
     # Allocate memory on the GPU for output variable
     n_transf = 1
     u_gpu = GPUArray((n_transf, shape[0], shape[1], shape[2]), dtype=U.dtype)
     
     # Initialize the plan and set the points
     plan = cufinufft(1, shape, n_transf, eps=eps, dtype=kx.dtype)
     plan.set_pts(to_gpu(kx), to_gpu(ky), to_gpu(kz))
     
     # Execute the plan, reading from the array U and storing the result in u_gpu
     plan.execute(to_gpu(U), u_gpu)
     
     # Retreive the result from the GPU
     return u_gpu.get()[0,:,:,:]

# 3D settings
# shape = (32,32,32)
# shape = (64,64,64)
shape = (128,128,128)
eps = 1e-7
u = np.random.randn(*shape).astype(np.complex64)+1j*np.random.randn(*shape).astype(np.complex64)
M = 10
kx = np.linspace(-np.pi,np.pi, num=M).astype(np.float32)
ky = np.linspace(-np.pi,np.pi, num=M).astype(np.float32)
kz = np.linspace(-np.pi,np.pi, num=M).astype(np.float32)

# Adjoint test
U  = cufinufft3d_eval(u, kx.flatten(), ky.flatten(), kz.flatten(), eps=eps)
V = np.random.randn(*U.shape).astype(np.complex64)+1j*np.random.randn(*U.shape).astype(np.complex64)
v = cufinufft3d_adjeval(V, kx.flatten(), ky.flatten(), kz.flatten(), shape, eps=eps)
a = np.vdot(U.flatten(),V.flatten())
b = np.vdot(u.flatten(),v.flatten())
np.linalg.norm(a-b)/np.linalg.norm(a) # rel err
@janden
Copy link
Collaborator

janden commented Feb 4, 2022

Thanks for bringing this up. Have you observed this with other NUFFT implementations, such as Finufft?

Will see if I can reproduce this.

@janden
Copy link
Collaborator

janden commented Feb 4, 2022

So I'm seeing errors of order one for all shapes when I run this locally. Will try on another machine. I can't see any real issue with your script though, so it's not clear to me what could be happening.

@grizzuti
Copy link
Author

grizzuti commented Feb 4, 2022

Hi Joakim, I did the same experiment with the cpu implementation of finufft; no issues there!
Also, gpu 2D is fine. On my machine, inaccuracies are more pronounced for larger 3D experiments.

@janden
Copy link
Collaborator

janden commented Feb 4, 2022

So the problem seems to be hardware dependent. On my laptop (GeForce 940MX), all three sizes (32, 64, and 128) give an error of order one. Same when I run this on a P100. However, when I run on the V100/A100, I get an error of order 1e-7, 1e-6, and 1e-5, respectively.

My first guess was that this is memory related, but the P100 and V100 both have 16 GB of memory, so that's not it. Will dig deeper into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants