SynthLabs

A post-training AI research lab advancing and scaling synthetic reasoning

Welcome to the official GitHub for SynthLabs.ai 👋

🔬 Featured Research

Generative Reward Models

Our latest work introduces Generative Reward Models (GenRM) and Chain-of-Thought GenRM (CoT-GenRM), a framework for preference learning that unifies RLHF and RLAIF approaches. We demonstrate that by combining iterative preference learning algorithms (STaR-DPO) with CoT-GenRM, we can train models that achieve comparable performance on in-domain data to Bradley-Terry Reward Models (currently best-in-class method), while vastly outperforming them on out-of-domain data (up to 45% improvement). All while providing rationales for the model's predicted preference. The GenRM framework unifies language models and reward models under a single next-token prediction framing, reducing the infrastructure overhead required. The development of CoT-GenRM and STaR-DPO opens up new possibilities for AI alignment:

More Robust AI Systems: Create AI systems that better generalize to new situations and maintain alignment with human values.
Efficient Scaling: Allow for more rapid iteration and refinement of AI behavior.
Potential for Personalization: Address the challenge of aligning AI with diverse and potentially conflicting human views.
Improved Reasoning Capabilities: Pave the way for AI systems that can continually improve their own reasoning and decision-making processes.

Contributions from Dakota Mahan*, Duy Van Phung*, Rafael Rafailov*, Chase Blagden, Nathan Lile, Louis Castricato, Jan-Philipp Fränken, Chelsea Finn, and Alon Albalak*.

Learn more:

Blog
ArXiV

PERSONA: A Reproducible Testbed for Pluralistic Alignment

This work introduces PERSONA, a framework for evaluating the ability of language models to align with a diverse set of user values, using 1,586 synthetic personas, 3,868 prompts, and 317,200 preference pairs. We focus on pluralistic alignment because we want langauge models that can reflect a diverse set of values, not just the majority opinion, and we don't prescribe to a one-size-fits-all approach. PERSONA is synthetically constructed from U.S. census data, allowing us to generate a large, diverse dataset while ensuring privacy and reproducibility. The dataset and evaluation framework can be used for a variety of purposes, inlcluding: (1) a test bed, (2) a development environment, (3) a reproducible evaluation for pluralistic alignment approaches, (4) the personalization of language models, (5) and for preference elicitation.

Contributions from Louis Castricato*, Nathan Lile*, Rafael Rafailov, Jan-Philipp Fränken, and Chelsea Finn.

Learn more:

Blog
ArXiv

Suppressing Pink Elephants with Direct Principle Feedback

This work represents a significant advancement in the field of controllable language models. This research addresses the 'Pink Elephant Problem' - instructing language models to avoid certain topics ("Pink Elephants") and focus on preferred ones ("Grey Elephants"). Key highlights:

Controllable Generation: Dynamically adjust language models at inference time for diverse needs across multiple contexts
Direct Principle Feedback (DPF): We introduce a novel simplification of Constitutional AI, Direct Principle Feedback, which directly applies principles to critiques and revisions without the need for ranking responses.
Significant Performance Improvements: After fine-tuning with DPF on our synthetic Pink Elephants dataset, our 13B fine-tuned LLaMA 2 model outperformed existing models and matched the performance of GPT-4 on our curated test set for the Pink Elephant Problem.

Contributions from Louis Castricato, Nathan Lile, Suraj Anand, Hailey Schoelkopf, Siddharth Verma, and Stella Biderman.

Learn more:

ArXiv

📰 Featured Media/Press

💼 Join Our Team

We're always looking for talented individuals to join our team. If you're passionate about AI and want to work on cutting-edge research, check out our career opportunities.

🌐 Connect with Us

Join us in shaping an aligned and impactful AI future! 🤝

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SynthLabs

SynthLabs

A post-training AI research lab advancing and scaling synthetic reasoning

🔬 Featured Research

Generative Reward Models

PERSONA: A Reproducible Testbed for Pluralistic Alignment

Suppressing Pink Elephants with Direct Principle Feedback

📰 Featured Media/Press

💼 Join Our Team

🌐 Connect with Us

Popular repositories Loading

Repositories

People

Top languages

Most used topics