Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Python 1,867 148 Updated Jan 10, 2025

xfactlab / orpo

Official repository for ORPO

Python 429 40 Updated May 31, 2024

vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)

Python 6,043 684 Updated Jan 9, 2025

vwxyzjn / summarize_from_feedback_details

Python 125 16 Updated Nov 23, 2024

Farama-Foundation / Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)

Python 7,832 881 Updated Jan 12, 2025

centerforaisafety / HarmBench

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Jupyter Notebook 388 61 Updated Aug 16, 2024

callummcdougall / sae_vis

Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).

HTML 176 37 Updated Dec 16, 2024

allenai / reward-bench

RewardBench: the first evaluation tool for reward models.

Python 485 56 Updated Jan 8, 2025

Xwin-LM / Xwin-LM

Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment

Python 1,027 41 Updated May 31, 2024

NVIDIA / NeMo-Aligner

Scalable toolkit for efficient model alignment

Python 670 85 Updated Jan 12, 2025

RLHFlow / Online-RLHF

A recipe for online RLHF and online iterative DPO.

Python 480 51 Updated Dec 28, 2024

RLHFlow / RLHF-Reward-Modeling

Recipes to train reward model for RLHF.

Python 1,129 79 Updated Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jia basicv8vc

Achievements

Achievements

Organizations

Block or report basicv8vc

Alignment

AlignmentResearch / tuned-lens

MatthewJA / Inverse-Reinforcement-Learning

openai / transformer-debugger

huggingface / alignment-handbook

ContextualAI / HALOs

tatsu-lab / alpaca_farm

vwxyzjn / lm-human-preference-details

eric-mitchell / direct-preference-optimization

huggingface / trl

EleutherAI / elk

PKU-Alignment / safe-rlhf

andyzoujm / representation-engineering

collin-burns / discovering_latent_knowledge

anthropics / sleeper-agents-paper

yizhongw / self-instruct

GanjinZero / RRHF

openai / automated-interpretability

guidance-ai / guidance

argilla-io / distilabel