-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch.fx.proxy.TraceError: class MMArchitectureQuant
#621
Comments
This might be tangentially related to what I encountered in the mmpose TopdownEstimator in issue #3012 You might need to refactor the model so that there is no self-referencing methods within it, and instead point to wrapped outer methods. I haven't checked if thats the case for mmseg but it might point you in the right direction. |
Hi, I have the same problem with the class EncoderDecoder from the segmentors of MMSegmentation (line 208). Did you manage to refactor your model and how? |
Yes, I haven't posted an issue yet, but you should mimic the structure in mmpretrain.models.heads.cls_head.ClsHead where there is an additional |
Thank you. I have changed the following argument of the MMRazor CustomTracer to fit with the EncoderDecoder class:
Both auxiliary head (FCNHead) and decode head (PSPHead) use the the same predict and loss functions. Moreover, I have take the whole code of the EncoderDecoder predict method out of the class (except from the self.inference() call), by creating functions with a @torch.fx.wrap decorator.
The problem now is when calling the EncoderDecoder loss function, it calls the EncoderDecoder _decode_head_forward_train and _auxiliary_head_forward_train functions which try to update a dictionnary of losses. I can't make the same changes you have made in mmpose TopdownEstimator for the loss function, as the latter two functions update the dictionnary. Do I have to pass the EncoderDecoder loss function entirely to Here is the full log of the issue:
|
Passing the entire loss function to In this case something like this should work: def _get_loss(self, x: Tensor, data_samples: SampleList) -> dict:
"""Calculate losses from a batch of inputs and data samples.
Args:
x (Tensor): forward call result.
data_samples (list[:obj:`SegDataSample`]): The seg data samples.
It usually includes information such as `metainfo` and
`gt_sem_seg`.
Returns:
dict[str, Tensor]: a dictionary of loss components
"""
losses = dict()
loss_decode = self._decode_head_forward_train(x, data_samples)
losses.update(loss_decode)
if self.with_auxiliary_head:
loss_aux = self._auxiliary_head_forward_train(x, data_samples)
losses.update(loss_aux)
return losses
def loss(self, inputs: Tensor, data_samples: SampleList) -> dict:
"""Calculate losses from a batch of inputs and data samples.
Args:
inputs (Tensor): Input images.
data_samples (list[:obj:`SegDataSample`]): The seg data samples.
It usually includes information such as `metainfo` and
`gt_sem_seg`.
Returns:
dict[str, Tensor]: a dictionary of loss components
"""
x = self.extract_feat(inputs)
losses = self._get_loss(x, data_samples)
return losses with a config that skips |
Thank you for your answer. Unfortunately, this doesn't work (see traceback below). It seems to be a malfunction in the trace function when dealing with the 'loss' mode.
When the trace function of CustomTracer is called, it calls the create_arg method of torch fx for the forward method of EncoderDecoder and several of its modules. However, one of these modules is the EncoderDecoder itself (not submodules), which should not. It enters in create_arg and crashes in this condition because the EncoderDecoder module has no name I think the problem comes from the fact that the _get_loss function is still in the EncoderDecoder class: this makes the EncoderDecoder model appear in the arguments of the create_arg method. I had the same issue and traceback with the tracing of the 'predict' mode and I made some changes (see in this comment). I take the |
The above Traceback makes me think that you didn't add EDIT: I see, so EncoderDecider is not a submodule, sorry, if so, you'll need to refactor the loss function into not using EDIT 2: Or, alternatively, factor the dict handling out of the class and decorate it with EDIT 3: You might also need to refactor and skip the refactored code from the decoder head and auxiliary head losses when they also handle dictionaries. |
Thank you! Indeed, it works by refactoring the dict handling the batch preparation in respectively the What is the difference between the use of the |
Do make sure that the
|
How do you check if methods have nodes in common? |
Anything that has a forward calculation would need to not be skipped. one way to check is adding a printout of the JIT graph within mmrazor's CustomTracer |
Hi, I encountered the same error and modified the predict and loss functions as outlined in this comment and this comment. I also added BaseDecodeHead.predict_by_feat and BaseDecodeHead.loss_by_feat to the skipped functions. Could you provide more context on FX tracing? I want to ensure I'm not missing any critical steps from the solution you mentioned above. Also, I assume the _prepare_batch() function includes the if-else block from the original script—please confirm if this is correct. Thank you in advance. |
Hi, tracing using Torch FX needs you to make the tracer skip every untraceable code parts. Untraceable parts are all
The last point is useful for example if you want to skip only the postprocessing part after the forward of your detection head, although they are originally in the same class method. So, the untraceable parts are still used but not traced thanks to the decorator. |
Based on your previous response and this comment, I have refactored the dict handling for both
Can you please confirm if this is correct ? Also, regarding the postprocess_result method of the class BaseSegmentor, did you just move the entire function out of the class and wrap it with the |
I guess you can delete some methods from your skipped_methods=[
'mmseg.models.decode_heads.decode_head.BaseDecodeHead.predict_by_feat',
'mmseg.models.decode_heads.decode_head.BaseDecodeHead.loss_by_feat',
# 'mmseg.models.segmentors.encoder_decoder.EncoderDecoder._get_predictions', --> no such method in native EncoderDecoder
# 'mmseg.models.segmentors.encoder_decoder.EncoderDecoder._get_loss', --> no such method in native EncoderDecoder
# 'mmseg.models.segmentors.encoder_decoder.PSPHead.loss_by_feat', --> already skipped with second element
'mmseg.models.segmentors.encoder_decoder.FCNHead.loss'
]
) Then, yes I moved the entire @torch.fx.wrap
def postprocess_result(seg_logits: Tensor,
decode_head_threshold: float,
align_corners: bool,
data_samples: OptSampleList = None) -> SampleList:
# code of the base postprocess_result method Also, the handling of the loss dictionnaries will be problematic, so you will have to apply the same trick. You can see this repository of MMDetection for MMRazor for more examples: https://github.com/HIT-cwh/mmdetection/tree/for_mmrazor |
As we discussed and following the mmdet logic for mmrazor, I have refactored the dict handling (
However, I'm still encountering the torch.fx.proxy.TraceError when predict is called:
Do I miss something in batch preparation ( |
After following the suggestions from the previous responses, I added skipped_methods, which resolved the issue in the backbone. However, I feel that this current error cannot be solved. This is my config
And this is Traceback
My understanding is that when I add 'mmdet.models.dense_heads.base_dense_head.BaseDenseHead.predict_by_feat' to skipped_methods, this issue should be resolved, but in fact, it hasn't. So, I would like to ask those who have solved this problem for some advice. |
You can't wrap the entire loss part as it is not going to trace the forward of the head ( The predict and postprocessing parts seem ok, so the only thing you have to change in your code is this loss part.
With this done, it should be working. |
Hi @psychedelicosisyphus, I have never faced this issue before. But as I said above, you need to skip tracing the parts with So check carrefully what you are skipping by passing methods to the |
So, based on your last response, I replaced the
However, I'm still encountering a TraceError in the add_prefix method, which I also refactored using
|
Describe the bug
torch.fx.proxy.TraceError: class
MMArchitectureQuant
in mmrazor/models/algorithms/quantization/mm_architecture.py: Proxy object cannot be iterated. This can be attempted when the Proxy is used in a loop or as a *args or **kwargs function argument. See the torch.fx docs on pytorch.org for a more detailed explanation of what types of control flow can be traced, and check out the Proxy docstring for help troubleshooting Proxy iteration errorsI am currently trying to quantify the segmentation model, and the configuration file is as follows Then I reported the bug above Can you help me check how to solve it? Thank you.
The base configuration file is a segmentation model I modified based on DDRNet, with only 3 categories, and all other configurations are consistent
base = [
'mmseg::ddrnet/ddrnet_23-slim_in1k-pre_2xb6-120k-1024x1024_label3.py',
'../../deploy_cfgs/mmseg/set_tensorrt-int8-explicit-1024x1024_label3.py'
]
base.val_dataloader.batch_size = 32
test_cfg = dict(
type='mmrazor.PTQLoop',
calibrate_dataloader=base.val_dataloader,
calibrate_steps=32,
)
float_checkpoint = 'https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_s_8x8_300e_coco/yolox_s_8x8_300e_coco_20211121_095711-4592a793.pth' # noqa: E501
global_qconfig = dict(
w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
a_observer=dict(type='mmrazor.MovingAverageMinMaxObserver'),
w_fake_quant=dict(type='mmrazor.FakeQuantize'),
a_fake_quant=dict(type='mmrazor.FakeQuantize'),
w_qscheme=dict(
qdtype='qint8', bit=8, is_symmetry=True, is_symmetric_range=True),
a_qscheme=dict(qdtype='quint8', bit=8, is_symmetry=True),
)
crop_size = (1024, 1024)
model = dict(
delete=True,
type='mmrazor.MMArchitectureQuant',
data_preprocessor = dict(
type='mmseg.SegDataPreProcessor',
size=crop_size,
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
bgr_to_rgb=True,
pad_val=0,
seg_pad_val=255),
architecture=base.model,
deploy_cfg=base.deploy_cfg,
float_checkpoint=float_checkpoint,
quantizer=dict(
type='mmrazor.TensorRTQuantizer',
global_qconfig=global_qconfig,
tracer=dict(
type='mmrazor.CustomTracer',
skipped_methods=[
'mmseg.models.decode_heads.ddr_head.DDRHead.loss_by_feat',
])))
model_wrapper_cfg = dict(
type='mmrazor.MMArchitectureQuantDDP',
broadcast_buffers=False,
find_unused_parameters=True)
custom_hooks = []
May I ask where my configuration file is written incorrectly? thanke you!
The text was updated successfully, but these errors were encountered: