- We use distributed training with 2 GPUs by default. For different settings such as transformer backbones, we will illustrate in the benchmark.
- (TODO) For the consistency across different hardwares, we report the GPU memory as the maximum value of
torch.cuda.max_memory_allocated()
for all 4 GPUs withtorch.backends.cudnn.benchmark=False
. Note that this value is usually less than whatnvidia-smi
shows. - (TODO) We report the inference time as the total time of network forwarding and post-processing, excluding the data loading time.
Results are obtained with the script
tools/benchmark.py
which computes the average time on 200 images withtorch.backends.cudnn.benchmark=False
. - (TODO) For input size of 8x+1 (e.g. 769),
align_corner=True
is adopted as a traditional practice. Otherwise, for input size of 8x (e.g. 512, 1024),align_corner=False
is adopted. I think there are potential discrepancies here. Take an instance of Adabins, will the input is not 8x+1, it usesalign_corner=True
in their offical implementation. The influence to results is not proved. More exps TBD.
Please refer to BTS for details.
Please refer to Adabins for details.
This is a simple implementation. Only model structure is aligned with original paper. More experiments about training settings or loss functions are needed to be done.
Please refer to DPT for details.
Please refer to SimIPU for details.
Please refer to DepthFormer for details.