You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fully Motion-Aware Network for Video Object Detection
Architecture
Summary
Similar with FGFA, but in addtion to pixel-level feature calibration and aggregagtion, MANet proposes the motion pattern reasoning module to dynamically combine (learnable soft weights) pixel-level and instance-level calibration according to the motion (optical flow by FlowNet). Instance-level calibration is achieved by regressing relative movements $(\Delta x , \Delta y , \Delta w , \Delta h)$ on the optical flow estimation according to proposal positions of reference frame. Final feaure maps for detection network (R-FCN) are the aggregation of nearby (13 frames in total) calibrated feature maps. Pixel-level calibration achieves better improvements for non-rigid movements while instance-level calibration is better for rigid movements and occlusion cases.