Alignment

Papers

(2023-05) LIMA: Less Is More for Alignment paper
(2023-05) RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs paper
(2023-05) Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision paper
(2023-05) Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback paper
(2023-04) Fundamental Limitations of Alignment in Large Language Models paper