From 2d3b4f7dd4e8d53618c7e68e87b03f87b2a59ee7 Mon Sep 17 00:00:00 2001 From: HsinYingLee Date: Wed, 30 Oct 2024 11:32:16 -0700 Subject: [PATCH 1/2] Trigger rebuild --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 9b40f2a..6ad552f 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,3 @@ # DELTA - Dense Efficient Long-range 3D Tracking for Any video + Dense Efficient Long-range 3D Tracking for Any video From d53ddeb7b0f72664433c98c52f562e56f335a81a Mon Sep 17 00:00:00 2001 From: HsinYingLee Date: Wed, 30 Oct 2024 11:42:12 -0700 Subject: [PATCH 2/2] update sfm url --- index.html | 58 +++++++++++++++++++++----------------------- resources/.DS_Store | Bin 10244 -> 0 bytes 2 files changed, 28 insertions(+), 30 deletions(-) delete mode 100644 resources/.DS_Store diff --git a/index.html b/index.html index c984725..5942166 100644 --- a/index.html +++ b/index.html @@ -10,7 +10,7 @@ - + @@ -108,7 +108,7 @@ height: 30px; } - + @@ -117,7 +117,7 @@

DELTA: Dense Efficient Long-range
3D Tracking for Any video
-

+

Tuan Duc Ngo1,2 @@ -142,9 +142,9 @@


- 1 Snap Inc + 1 Snap Inc 2 UMass Amherst - 3 TU Crete + 3 TU Crete 4 MIT-IBM Watson AI Lab

@@ -165,15 +165,15 @@

- +
- +
- +
@@ -181,7 +181,7 @@

- +
@@ -189,17 +189,17 @@

- + -

+

- DELTA captures dense, 3D, long-range trajectories from casual in-the-wild videos in a feed-forward manner. + DELTA captures dense, 3D, long-range trajectories from casual in-the-wild videos in a feed-forward manner.

- + @@ -210,23 +210,23 @@

Intro Video (1min)

--> - +

Abstract

- Tracking dense 3D motion from monocular videos remains challenging, particularly - when aiming for pixel-level precision over long sequences. We introduce DELTA, - a novel method that efficiently tracks every pixel in 3D space, enabling accurate + Tracking dense 3D motion from monocular videos remains challenging, particularly + when aiming for pixel-level precision over long sequences. We introduce DELTA, + a novel method that efficiently tracks every pixel in 3D space, enabling accurate motion estimation across entire videos. Our approach leverages a joint global-local attention mechanism for reduced-resolution tracking, followed by a - transformer-based upsampler to achieve high-resolution predictions. Unlike existing + transformer-based upsampler to achieve high-resolution predictions. Unlike existing methods, which are limited by computational inefficiency or sparse tracking, DELTA delivers dense 3D tracking at scale, running over 8x faster than previous methods while achieving state-of-the-art accuracy. Furthermore, we explore the impact of depth representation on tracking performance and identify - log-depth as the optimal choice. Extensive experiments demonstrate the superiority + log-depth as the optimal choice. Extensive experiments demonstrate the superiority of DELTA on multiple benchmarks, achieving new state-of-the-art results in both 2D and 3D dense tracking tasks. Our method provides a robust solution for applications requiring fine-grained, long-term motion tracking in 3D space. @@ -239,7 +239,7 @@

Motivation


Existing motion prediction methods struggle with short-term, sparse predictions and often fail to deliver accurate 3D motion estimations while optimization-based approaches require substantial time to process a single video. - We are the first method capable of efficiently tracking every pixel in 3D space over hundreds of frames from monocular videos, and achieves + We are the first method capable of efficiently tracking every pixel in 3D space over hundreds of frames from monocular videos, and achieves state-of-the-art accuracy on 3D tracking benchmarks.

@@ -302,7 +302,7 @@

More results

- +
@@ -414,16 +414,14 @@

Non-rigid Structure from motion

- We first densely track pixels across multiple keyframes in the video to obtain pairwise correspondences. - Using these correspondences, we jointly estimate per-keyframe depth maps and camera poses through the Global Alignment in + We first densely track pixels across multiple keyframes in the video to obtain pairwise correspondences. + Using these correspondences, we jointly estimate per-keyframe depth maps and camera poses through the Global Alignment in DUSt3R and MonST3R. @@ -440,7 +438,7 @@

Consistent video editting in 3D space
- +
@@ -449,7 +447,7 @@

Consistent video editting in 3D space
- +
@@ -474,8 +472,8 @@

More quantitative results can be found in our paper

- Acknowledgements: We borrow this template from MonST3R and 4Real. - The tracking visualization is inspired by CoTracker. The camera pose visualization tool is borrowed from MonST3R. + Acknowledgements: We borrow this template from MonST3R and 4Real. + The tracking visualization is inspired by CoTracker. The camera pose visualization tool is borrowed from MonST3R. We sincerely thank the authors for publishing the source code.

diff --git a/resources/.DS_Store b/resources/.DS_Store deleted file mode 100644 index f33bef12ad35c3206c7e5b4b59443ad925de2d4c..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 10244 zcmeHM%}*0S6o1=Rx}^{?5%oe`Vq!cHpb~3o!hBPzD?%R*|+nM>jUuWh`0YJ;FTPXko z0CYU49T~!tPoeMWai~PZlPRQIET zoWB=rQowikbBgYuo&3cL`@kSrP=YED$U_5>COiX|=KO7p_OlmEq3PBH=3}8|SKP+* z^u88}#!d{xO*U#52t2(b%)uFb;3N~xJ2Ien8j4YT5u>I_h^ zO6c4pGreJWLgOe^s9tCDP1 zkfbNit}N`9KQ9FPT0Q&L%5wd0Xfu54-^kS+n>cIup0;s_1>XS|C6I>=sIYtD6#&pS z%AyIx^@#`q`zOa)k6io^#-SYRyYut6&pc(Z4*YhNwBy~2zZ`Qt%A^b^ahL8vya^xYz7t<$F^6i@}BwvRP z8705|K*D_eWDx#cKOqY5* z3K?eFplggeYNL*}$Oa6{yUvJw$DS`pi_;KfpC}`whq5Zh#>d|xHzD4#BDUv>dt-cM z8H#ADp*MuAgt7PEut8xh@~D7%tD+cC3>*mqQT|nf?*HEk{{R0->Q*0B3@8SU2LoCx zo6BafEp&g!uh6}A1CJ+o&_exM+%gEB*mgWxwjGZ*bUXeM9@KZ5T3YZrEo_The1he_ n{~7R)HvR&Va5-%YaFG{kD*yN2LR9&`EF|B&zC%WTm;e6;CiT(Z