From 85855009709457e58f4272d0ebc7f53616ffd5f8 Mon Sep 17 00:00:00 2001 From: Rishabh Bhardwaj <32847115+Bhardwaj-Rishabh@users.noreply.github.com> Date: Sat, 28 Oct 2023 11:11:50 +0800 Subject: [PATCH] Update README.md Added Red-teaming paper on "Language model unalignment" --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index ac614fd..79890c9 100644 --- a/README.md +++ b/README.md @@ -958,6 +958,7 @@ Please click [here](Experiments/README.md) to view more detailed information. 85. **"On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex"**. *Terry Yue Zhuo et al.* EACL 2023. [[Paper](https://arxiv.org/abs/2301.12868)] 86. **"A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets"**. Laskar et al.* ACL'23. [[Paper]](https://arxiv.org/abs/2305.18486) 87. **"Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment"**. *Rishabh Bhardwaj et al*. arXiv 2023. [[Paper](https://arxiv.org/abs/2308.09662)] +88. **"Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases"**. *Rishabh Bhardwaj et al*. arXiv 2023. [[Paper](https://arxiv.org/pdf/2310.14303.pdf)] ### The Team