Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA-BEVFusion: difference in the result of onnx and pytorch model #283

Open
Jayden9912 opened this issue Sep 12, 2024 · 5 comments
Open

Comments

@Jayden9912
Copy link

Hi.

I have converted BEVFusion model from torch to onnx but it fails np.testing.allclose

Mismatched elements: 17068436 / 100663296 (17%)
Max absolute difference: 0.00343999
Max relative difference: 17035.555
 x: array([[[[0.401343, 0.140742, 0.282421, ..., 1.049651, 1.047727,
          0.911273],
         [0.486538, 0.307632, 0.1848  , ..., 0.661901, 0.757838,...
 y: array([[[[0.401524, 0.140862, 0.282235, ..., 1.049505, 1.047449,
          0.910834],
         [0.486407, 0.307709, 0.18471 , ..., 0.661574, 0.757754,..

Is it normal?

@hopef
Copy link
Collaborator

hopef commented Sep 12, 2024

This is normal. We evaluate the difference between the torch and tensor by using mAP, not absolute difference.

@hopef
Copy link
Collaborator

hopef commented Sep 12, 2024

This is because of the nondeterministic implementation in the BEVFusion pipeline, such as Voxelization and Spconv.

@Jayden9912
Copy link
Author

Jayden9912 commented Sep 12, 2024

Hi. Thanks for the reply.

    def forward(self, img, depth):
        B, N, C, H, W = img.size()
        img = img.view(B * N, C, H, W)

        feat = self.model.encoders.camera.backbone(img)
        # feat = self.model.encoders.camera.neck(feat)
        if not isinstance(feat, torch.Tensor):
            feat = feat[0]

        BN, C, H, W = map(int, feat.size())
        feat = feat.view(B, int(BN / B), C, H, W)

        # def get_cam_feats(self, x, d):
        def get_cam_feats(self, x):
            B, N, C, fH, fW = map(int, x.shape)
            # d = d.view(B * N, *d.shape[2:])
            x = x.view(B * N, C, fH, fW)

            # d = self.dtransform(d)
            # x = torch.cat([d, x], dim=1)
            x = self.depthnet(x)

            depth = x[:, : self.D].softmax(dim=1)
            # feat  = x[:, self.D : (self.D + self.C)].permute(0, 2, 3, 1)
            feat = depth.unsqueeze(1) * x[:, self.D : (self.D + self.C)].unsqueeze(2)

            feat = feat.view(B, N, self.C, self.D, fH, fW)
            feat = feat.permute(0, 1, 3, 4, 5, 2)
            return feat
        
        # return get_cam_feats(self.model.encoders.camera.vtransform, feat, depth)
        # return get_cam_feats(self.model.encoders.camera.vtransform, feat)
        return feat

This is my forward function for resnet. I found this problem for all onnx files so I only export camera backbone to onnx and compare with torch output.

Based on my previous experience, the output from torch and onnx should pass the allclose test, which is not the case here.

np.testing.assert_allclose(to_numpy(torch_out[0]), ort_outs[0][0], rtol=1e-03, atol=1e-05)

Could you share more regarding this based on your experience?

I have skipped PTQ and only export torch to onnx with opset version 13.
Environement:
torch 1.10.0 with cuda11.1
onnx 1.12.0

My observation is that some numpy functions are optimized and not so accurate for float32, but I also realised a lot of people using allclose test for onnx model and it passes the test, so I am quite confused.

@hopef
Copy link
Collaborator

hopef commented Sep 19, 2024

The difference will be expected if you are running on fp16 (trtexec --onnx=model.onnx --fp16). Because fp16 has a lower representation precision compared to fp32.
If you are running on fp32, you might encounter a bug. (trtexec --onnx=model.onnx)

@Jayden9912
Copy link
Author

Hi HopeJW. Thanks for your reply.

Could you clarify on the bug you mentioned on fp32?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants