Our solution is to simply map the affordances back to the first frame without the human.
-diff --git a/MEME/resources/.DS_Store b/MEME/resources/.DS_Store index 52c11f5..5861580 100644 Binary files a/MEME/resources/.DS_Store and b/MEME/resources/.DS_Store differ diff --git a/antibiotics-benchmark/index.html b/antibiotics-benchmark/index.html index 814fec5..9db0ab8 100644 --- a/antibiotics-benchmark/index.html +++ b/antibiotics-benchmark/index.html @@ -179,491 +179,7 @@
We extract affordances from large scale human video datasets such as Ego4D and Epic Kitchens. We use off the shelf hand-object interaction detectors to find the contact region and post contact wrist trajectory.
-Illustration of annotation pipeline (Left) Find the frame with hand-object contact. (Middle) Track the wrist to obtain the post contact trajectory. (Right) Map both to the first human entry frame for reference.
-We first find the contact point using a hand object interaction detector. We find the post contact trajectory by tracking the wrist after the contact frame. Once we have detected these frames, a major issue is that the the human is still in the scene , leading to a distribution shift.
-Our solution is to simply map the affordances back to the first frame without the human.
-We use available camera information to project both the contact points and the post-contact trajectory to the human-agnostic frame. This frame is used as input to our model.
-- Our model takes a human-agnostic frame as input. The contact head outputs a contact heatmap (left) and the trajectory transformer predicts wrist waypoints (orange). This output can be directly used at inference time (with sparse 3D information, such as depth, and robot kinematics). -
-We benchmark VRB on 10+ Tasks, 2 robot morphologies, 4 learning paradigms
-- Robot Learning Paradigms (Top-Left) Affordance-model driven data collection for offline imitation. (Top-Right) Reward free exploration. (Bottom-Left) Goal-conditioned policy learning with our affordance model. (Bottom-Right) Using the affordance model outputs to reparameterize actions. -
-- We have also tested VRB on a simulation benchmark, specifically, the Franka Kitchen benchmark from D4RL. Our method demonstrates superior performance compared to the baselines on three distinct tasks within the benchmark. -
-- VRB demonstrates effective handling of rare objects, outperforming the Hotspots baseline in grasping various held-out items. This showcases VRB's adaptability to different tasks and environments. -
-@misc{lee2024multimodal,
- title={Multimodal Clinical Pseudo-notes for Emergency Department Prediction Tasks using Multiple Embedding Model for EHR (MEME)},
- author={Lee, Simon A and Jain, Sujay and Chen, Alex and Biswas, Arabdha and Fang, Jennifer and Rudas, Akos and Chiang, Jeffrey N},
- journal={arXiv preprint arXiv:2402.00160},
- year={2024}}
- - We thank Arabdha Biswas for helping us run a random forest model.
-