Skip to content

LeungTsang/VIT-for-Pose-Estimation

Repository files navigation

Pose-VIT: Use transformer to regress poses between multiple frames in one shot.

Transfomer is efficient for processing video by joint space-time attention. (Is Space-Time Attention All You Need for Video Understanding? https://arxiv.org/pdf/2102.05095)

Network Architecture

RMSE to GT trajectory in KITTI. Performance increases when more frames are involved.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published