"Creating noise from data is easy; creating data from noise is generative modeling."
Yang Song in "Score-Based Generative Modeling through Stochastic Differential Equations" Song et al., 2020
This repository offers a brief summary of essential papers and blogs on diffusion models, alongside a categorized collection of robotics diffusion papers and useful code repositories for starting your own diffusion robotics project.
-
2.1 Imitation Learning and Policy Learning
2.2 Video Diffusion in Robotics
2.3 Online RL
2.4 Offline RL
2.5 Inverse RL
2.6 World Models
While there exist many tutorials for Diffusion models, below you can find an overview of some of the best introduction blog posts and video:
-
What are Diffusion Models?: an introduction video, which introduces the general idea of diffusion models and some high-level math about how the model works
-
Diffusion Models | Paper Explanation | Math Explained another great video tutorial explaining the math and notation of diffusion models in detail with visual aid
-
Generative Modeling by Estimating Gradients of the Data Distribution: blog post from the one of the most influential authors in this area, which introduces diffusion models from the score-based perspective
-
What are Diffusion Models: a in-depth blog post about the theory of diffusion models with a general summary on how diffusion model improved over time
-
Understanding Diffusion Models: an in-depth explanation paper, which explains the diffusion models from both perspectives with detailed derivations
If you don't like reading blog posts and prefer the original papers, below you can find a list with the most important diffusion theory papers:
-
Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International Conference on Machine Learning. PMLR, 2015.
-
Ho, Jonathan, et al. "Denoising diffusion probabilistic models." Advances in Neural Information Processing Systems 33 (2020): 6840-6851.
-
Song, Yang, et al. "Score-Based Generative Modeling through Stochastic Differential Equations." International Conference on Learning Representations. 2020.
-
Ho, Jonathan, and Tim Salimans. "Classifier-Free Diffusion Guidance." NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications. 2021.
-
Karras, Tero, et al. "Elucidating the Design Space of Diffusion-Based Generative Models." Advances in Neural Information Processing Systems 35 (2022)
A general list with all published diffusion papers can be found here: Whats the score?
Since the modern diffusion models have been around for only 3 years, the literature about diffusion models in the context of robotics is still small, but growing rapidly. Below you can find most robotics diffusion papers, which have been published at conferences or uploaded to Arxiv so far:
-
Ke et al. 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations
-
Wang, Bingzheng, et al. "DiffAIL: Diffusion Adversarial Imitation Learning." arXiv preprint arXiv:2312.06348 (2023).
-
Scheikl, Paul Maria, et al. "Movement Primitive Diffusion: Learning Gentle Robotic Manipulation of Deformable Objects." arXiv preprint arXiv:2312.10008 (2023).
-
Octo Model Team et al. Octo: An Open-Source Generalist Robot Policy
-
Black, Kevin, et al. "ZERO-SHOT ROBOTIC MANIPULATION WITH PRETRAINED IMAGE-EDITING DIFFUSION MODELS." arXiv preprint arXiv:2310.10639 (2023).
-
Reuss, Moritz, and Rudolf Lioutikov. "Multimodal Diffusion Transformer for Learning from Play." 2nd Workshop on Language and Robot Learning: Language as Grounding. 2023.
-
Sridhar, Ajay, et al. "NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration." arXiv preprint arXiv:2310.07896 (2023).
-
Zhou, Xian, et al. "Unifying Diffusion Models with Action Detection Transformers for Multi-task Robotic Manipulation." Conference on Robot Learning. PMLR, 2023.
-
Ze, Yanjie, et al. "Multi-task real robot learning with generalizable neural feature fields." 7th Annual Conference on Robot Learning. 2023.
-
Mishra, Utkarsh Aashu, et al. "Generative Skill Chaining: Long-Horizon Skill Planning with Diffusion Models." Conference on Robot Learning. PMLR, 2023.
-
Chen, Lili, et al. "PlayFusion: Skill Acquisition via Diffusion from Language-Annotated Play." Conference on Robot Learning. PMLR, 2023.
-
Ha, Huy, Pete Florence, and Shuran Song. "Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition." Conference on Robot Learning. PMLR, 2023.
-
Xu, Mengda, et al. "XSkill: Cross Embodiment Skill Discovery." Conference on Robot Learning. PMLR, 2023.
-
Li, Xiang, et al. "Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning." arXiv preprint arXiv:2307.01849 (2023).
-
Ng, Eley, Ziang Liu, and Monroe Kennedy III. "Diffusion Co-Policy for Synergistic Human-Robot Collaborative Tasks." arXiv preprint arXiv:2305.12171 (2023).
-
Chi, Cheng, et al. "Diffusion Policy: Visuomotor Policy Learning via Action Diffusion." Proceedings of Robotics: Science and Systems (RSS) 2023.
-
Reuss, Moritz, et al. "Goal-Conditioned Imitation Learning using Score-based Diffusion Policies." Proceedings of Robotics: Science and Systems (RSS) 2023.
-
Yoneda, Takuma, et al. "To the Noise and Back: Diffusion for Shared Autonomy." Proceedings of Robotics: Science and Systems (RSS) 2023.
-
Jiang, Chiyu, et al. "MotionDiffuser: Controllable Multi-Agent Motion Prediction Using Diffusion." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
-
Kapelyukh, Ivan, et al. "DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics." IEEE Robotics and Automation Letters (RA-L) 2023.
-
Pearce, Tim, et al. "Imitating human behaviour with diffusion models." " International Conference on Learning Representations. 2023.
-
Yu, Tianhe, et al. "Scaling robot learning with semantically imagined experience." arXiv preprint arXiv:2302.11550 (2023).
The ability of Diffusion models to generate realistic videos over a long horizon has enabled new applications in the context of robotics.
-
Liang, Zhixuan, et al. "SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution." arXiv preprint arXiv:2312.11598 (2023).
-
Huang, Tao, et al. "Diffusion Reward: Learning Rewards via Conditional Video Diffusion." arXiv preprint arXiv:2312.14134 (2023).
-
Du, Yilun, et al. "Video Language Planning." arXiv preprint arXiv:2310.10625 (2023).
-
Yang, Mengjiao, et al. "Learning Interactive Real-World Simulators." arXiv preprint arXiv:2310.06114 (2023).
-
Ko, Po-Chen, et al. "Learning to Act from Actionless Videos through Dense Correspondences." arXiv preprint arXiv:2310.08576 (2023).
-
Ajay, Anurag, et al. "Compositional Foundation Models for Hierarchical Planning." Advances in Neural Information Processing Systems 37 (2023)
-
Dai, Yilun, et al. "Learning Universal Policies via Text-Guided Video Generation." Advances in Neural Information Processing Systems 37 (2023)
The standard policy gradient objective requires the gradient of the log-likelihood, which is only implicitly defined by the underlying Ordinary Differential Equation (ODE) of the diffusion model.
-
Yang, Long, et al. "Policy Representation via Diffusion Probability Model for Reinforcement Learning." arXiv preprint arXiv:2305.13122 (2023).
-
Mazoure, Bogdan, et al. "Value function estimation using conditional diffusion models for control." arXiv preprint arXiv:2306.07290 (2023).
-
Kim, Sungyoon, et al. "Stitching Sub-Trajectories with Conditional Diffusion Model for Goal-Conditioned Offline RL." arXiv preprint arXiv:2402.07226 (2024).
-
Psenka, Michael, et al. "Learning a Diffusion Model Policy from Rewards via Q-Score Matching." arXiv preprint arXiv:2312.11752 (2023).
-
Chen, Chang, et al. "Simple Hierarchical Planning with Diffusion." arXiv preprint arXiv:2401.02644 (2024).
-
Brehmer, Johann, et al. "EDGI: Equivariant diffusion for planning with embodied agents." Advances in Neural Information Processing Systems 36 (2024).
-
Venkatraman, Siddarth, et al. "Reasoning with latent diffusion in offline reinforcement learning." arXiv preprint arXiv:2309.06599 (2023).
-
Chen, Huayu, et al. "Score Regularized Policy Optimization through Diffusion Behavior." arXiv preprint arXiv:2310.07297 (2023).
-
Ding, Zihan, and Chi Jin. "Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning." arXiv preprint arXiv:2309.16984 (2023).
-
Wang, Zidan, et al. "Cold Diffusion on the Replay Buffer: Learning to Plan from Known Good States." arXiv preprint arXiv:2310.13914 (2023).
-
Lee, Kyowoon, Seongun Kim, and Jaesik Choi. "Refining Diffusion Planner for Reliable Behavior Synthesis by Automatic Detection of Infeasible Plans." Advances in Neural Information Processing Systems 37 (2023)
-
Liu, Jianwei, Maria Stamatopoulou, and Dimitrios Kanoulas. "DiPPeR: Diffusion-based 2D Path Planner applied on Legged Robots." arXiv preprint arXiv:2310.07842 (2023).
-
Zhou, Siyuan, et al. "Adaptive Online Replanning with Diffusion Models." Advances in Neural Information Processing Systems 37 (2023)
-
Jain, Vineet, and Siamak Ravanbakhsh. "Learning to Reach Goals via Diffusion." arXiv preprint arXiv:2310.02505 (2023).
-
Geng, Jinkun, et al. "Diffusion Policies as Multi-Agent Reinforcement Learning Strategies." International Conference on Artificial Neural Networks. Cham: Springer Nature Switzerland, 2023.
-
Suh, H.J., et al. "Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching." Conference on Robot Learning. PMLR, 2023.
-
Yuan, Hui, et al. "Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement." arXiv preprint arXiv:2307.07055 (2023).
-
Hu, Jifeng, et al. "Instructed Diffuser with Temporal Condition Guidance for Offline Reinforcement Learning." arXiv preprint arXiv:2306.04875 (2023).
-
Hegde, Shashank, et al. "Generating Behaviorally Diverse Policies with Latent Diffusion Models." arXiv preprint arXiv:2305.18738 (2023).
-
Xiao, Wei, et al. "SafeDiffuser: Safe Planning with Diffusion Probabilistic Models." arXiv preprint arXiv:2306.00148 (2023).
-
Li, Wenhao, et al. "Hierarchical Diffusion for Offline Decision Making." International Conference on Machine Learning. PMLR, 2023.
-
Liang, Zhixuan, et al. "AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners." International Conference on Machine Learning. PMLR, 2023.
-
Lu, Cheng, et al. "Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning." International Conference on Machine Learning. PMLR, 2023.
-
Zhu, Zhengbang, et al. "MADiff: Offline Multi-agent Learning with Diffusion Models." arXiv preprint arXiv:2305.17330 (2023).
-
Kang, Bingyi, et al. "Efficient Diffusion Policies for Offline Reinforcement Learning." arXiv preprint arXiv:2305.20081 (2023).
-
Ni, Fei, et al. "MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL." International Conference on Machine Learning. PMLR, 2023.
-
He, Haoran, et al. "Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning." arXiv preprint arXiv:2305.18459 (2023).
-
Ajay, Anurag, et al. "Is Conditional Generative Modeling all you need for Decision-Making?." International Conference on Learning Representations. 2023.
-
Hansen-Estruch, Philippe, et al. "IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies." arXiv preprint arXiv:2304.10573 (2023).
-
Zhu, Zhengbang, et al. "MADiff: Offline Multi-agent Learning with Diffusion Models." arXiv preprint arXiv:2305.17330 (2023).
-
Zhang, Edwin, et al. "LAD: Language Augmented Diffusion for Reinforcement Learning." arXiv preprint arXiv:2210.15629 (2022).
-
Brehmer, Johann, et al. EDGI: Equivariant Diffusion for Planning with Embodied Agents Workshop on Reincarnating Reinforcement Learning at ICLR 2023.
-
Janner, Michael, et al. "Planning with Diffusion for Flexible Behavior Synthesis." International Conference on Learning Representations. 2022.
-
Wang, Zhendong, et al. "Diffusion policies as an expressive policy class for offline reinforcement learning." International Conference on Learning Representations. 2023.
-
Brehmer, Johann, et al. "EDGI: Equivariant Diffusion for Planning with Embodied Agents." arXiv preprint arXiv:2303.12410 (2023).
-
Chen, Huayu, et al. "Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling."" International Conference on Learning Representations. 2023.
- Nuti, Felipe, Tim Franzmeyer, and João F. Henriques. "Extracting Reward Functions from Diffusion Models." Advances in Neural Information Processing Systems 37 (2023)
-
Ding, Zihan, et al. "Diffusion World Model." arXiv preprint arXiv:2402.03570 (2024).
-
Rigter, Marc, Jun Yamada, and Ingmar Posner. "World models via policy-guided trajectory diffusion." arXiv preprint arXiv:2312.08533 (2023).
-
Zhang, Lunjun, et al. "Learning unsupervised world models for autonomous driving via discrete diffusion." arXiv preprint arXiv:2311.01017 (2023).
-
Yang, Cheng-Fu, et al. "Planning as In-Painting: A Diffusion-Based Embodied Task Planning Framework for Environments under Uncertainty." arXiv preprint arXiv:2312.01097 (2023).
-
Liu, Jiaqi, et al. "DDM-Lag: A Diffusion-based Decision-making Model for Autonomous Vehicles with Lagrangian Safety Enhancement." arXiv preprint arXiv:2401.03629 (2024).
-
Chang, Junwoo, et al. "Denoising Heat-inspired Diffusion with Insulators for Collision Free Motion Planning." NeurIPS 2023 Workshop on Diffusion Models
-
Ryu, Hyunwoo, et al. "Diffusion-edfs: Bi-equivariant denoising generative modeling on se (3) for visual robotic manipulation." arXiv preprint arXiv:2309.02685 (2023).
-
Yang, Zhutian, et al. "Compositional Diffusion-Based Continuous Constraint Solvers." 7th Annual Conference on Robot Learning. 2023.
-
Carvalho, Joao, et al. "Motion Planning Diffusion: Learning and Planning of Robot Motions with Diffusion Models.", IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). (2023)
-
Saha, Kallol, et al. "EDMP: Ensemble-of-costs-guided Diffusion for Motion Planning." arXiv preprint arXiv:2309.11414 (2023).
-
Power, Thomas, et al. "Sampling Constrained Trajectories Using Composable Diffusion Models." IROS 2023 Workshop on Differentiable Probabilistic Robotics: Emerging Perspectives on Robot Learning. 2023.
-
Zhong, Ziyuan, et al. "Language-Guided Traffic Simulation via Scene-Level Diffusion." Conference on Robot Learning. PMLR, 2023.
-
Fang, Xiaolin, et al. "DiMSam: Diffusion Models as Samplers for Task and Motion Planning under Partial Observability." arXiv preprint arXiv:2306.13196 (2023).
-
Liu, Weiyu, et al. "StructDiffusion: Object-centric diffusion for semantic rearrangement of novel objects." Proceedings of Robotics: Science and Systems (RSS) 2023.
-
Mishra, Utkarsh A., and Yongxin Chen. "ReorientDiff: Diffusion Model based Reorientation for Object Manipulation." RSS 2023 Workshop on Learning for Task and Motion Planning
-
Urain, Julen, et al. "SE (3)-DiffusionFields: Learning cost functions for joint grasp and motion optimization through diffusion." IEEE International Conference on Robotics and Automation (ICRA) 2023
-
Carvalho, J. et al. Conditioned Score-Based Models for Learning Collision-Free Trajectory Generation, NeurIPS 2022 Workshop on Score-Based Methods
-
Yoneda, Takuma, et al. "6-DoF Stability Field via Diffusion Models." arXiv preprint arXiv:2310.17649 (2023).
-
Simeonov, Anthony, et al. "Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement." arXiv preprint arXiv:2307.04751 (2023).
-
Higuera, Carolina, Byron Boots, and Mustafa Mukadam. "Learning to Read Braille: Bridging the Tactile Reality Gap with Diffusion Models." arXiv preprint arXiv:2304.01182 (2023).
Excited to see more diffusion papers in this area in the future! Using generative models to design robots is a very interesting idea, since it allows to generate new robot designs and test them in simulation before building them in the real world.
- Wang, Tsun-Hsuan, et al. "DiffuseBot: Breeding Soft Robots With Physics-Augmented Generative Diffusion Models." Thirty-seventh Conference on Neural Information Processing Systems. 2023.
There exist numerous implementations of all diffusion models on github. Below you can find a curated list of some clean code variants of the most important diffusion models in general and for robotics:
-
Diffusers: the main diffusion project from HuggingFaces with numerous pre-trained diffusion models ready to use
-
k-diffusion: while its not the official code-base of the EDM diffusion models from Karras et al., 2022, it has very clean code and numerous samplers. Parts of the code have been used in various other projects such as Consistency Models from OpenAI and diffusers from HuggingFaces.
-
denoising-diffusion-pytorch: a clean DDPM diffusion model implementation in Pytorch to get a good understanding of all the components
-
Diffuser: Variants of this code are used in numerous trajectory diffusion OfflineRL papers listed above
-
diffusion_policy: Beautiful Code implementation of Diffusion policies from Chi et al., 2023 for Imitation Learning with 9 different simulations to test the models on
-
octo-models: The first open source foundation behavior diffusion agent, pretrained on 800k trajectories of different embodiements. The JAX code allows you to download their weights and finetune your own Octo-model on your local dataset.
-
3d_diffuser_actor: Clean code to get started with 3D-based diffusion policies on the popular RL-bench and CALVIN benchmarks.
-
flow-diffusion: If you want to start training your own video-diffusion model, this is the right repository to start! Clean code implementations and available pre-training weights for real world dataset and two simulations.
-
dpm-solver: One of the most widely used ODE samplers for Diffusion models from Lu et al. 2022 with implementations for all different diffusion models including wrappers for discrete DDPM variants
Diffusion models are a type of generative model inspired by non-equilibrium thermodynamics, introduced by Sohl-Dickstein et al., (2015). The model learns to invert a diffusion process, that gradually adds noise to a data sample. This process is a Markov chain consisting of diffusion steps, which add random Gaussian noise to a data sample. The diffusion model is used to learn to invert this process. While the paper was presented in 2015, it took several years for the diffusion models to get widespread attention in the research community. Diffusion models are a type of generative model and in this field, the main focus are vision based applications, thus all theory papers mentioned in the text below are mostly focused on image synthesis or similar tasks related to it.
There are two perspectives to view diffusion models. The first one is based on the initial idea of Sohl-Dickstein et al., (2015), while the other is based on a different direction of research known as score-based generative models. In 2019 Song & Ermon, (2019) proposed the noise-conditioned score network (NCSN), which is a predecessor to the score-based diffusion model. The main idea was to learn the score function of the unknown data distribution using a neural network. This approach had been around before, however their paper and the subsequent work Song & Ermon (2020) enabled scaling score-based models to high-dimension data distributions and made them competitive on image-generation tasks. The key idea in their work was to perturb the data distribution with various levels of Gaussian noise and learn a noise-conditional score model to predict the score of the perturbed data distributions.
In 2020, Ho et al., (2020) introduced denoising diffusion probabilistic models (DDPM), which served as the foundation for the success of Diffusion models. At that time, Diffusion models still were not competitive with state-of-the-art generate models such as GANs. However, this changed rapidly the following year when Nichol & Dhariwal (2021) improved upon the previous paper and demonstrated, that Diffusion models are competitive with GANs on image synthesis tasks. Nevertheless, it is important to note, that Diffusion models are not the jack of all trades. Diffusion models still struggle with certain image traits such as generating realistic faces or generating the right amount of fingers.
Another important idea for diffusion models in the context of image generation has been the introduction of latent diffusion models by Rombach & Blattman et al., (2022). By training the diffusion model in the latent space rather than the image space directly, they were able to improve the sampling and training speed and made it possible for everyone to run their own diffusion model on local PCs with a single GPU. Recent AI generated art is mostly based on the stable AI implementation of latent diffusion models and is open source: Github repo. Check out some cool Diffusion art on the stable-diffusion-reddit.
Conditional Diffusion models
The initial diffusion models are usually trained on marginal distributions
- Classifier Guided Diffusion by Dhariwal & Nichol (2021)
- Classifier-Free Guidance (CFG) by Ho & Salimans, (2021)
- directly training a conditional diffusion model
$p(x|z)$
CFG is used in many applications, since it allows to train a conditional diffusion model and unconditional diffusion model at the same time. During inference, we can combine both models and control the generation process using a guidance weight.
Diffusion models perspectives
As previously mentioned, diffusion models can be viewed from two different perspectives:
- the denoising diffusion probabilistic perspective based on Ho et al., (2020)
- the score-based model perspective based on Song & Ermon, (2019)
There has been a lot of effort to combine these two views into one general framework. The best generalization has been the idea of stochastic differential equations (SDEs) first presented in Song et al. (2021) and further developed to unified framework in Karras et al. (2022).
While diffusion models have mainly been applied in the area of generative modeling, recent work has shown promising applications of diffusion models in robotics. For instance, diffusion models have been used for behavior cloning and offline reinforcement learning, and have also been used to generate more diverse training data for robotics tasks.
Diffusion models offer several useful properties in the context of robotics, including:
- Expressiveness: can learn arbitrarily complicated data-distributions
- Training stability: they are easy to train especially in contrast GANs or EBMs
- Multimodality: they are able to learn complicated multimodal distributions
- Compositionality: Diffusion models can combined in a flexible way to jointly generate new samples
Overall, diffusion models have the potential to be a valuable tool for robotics.