Skip to content

Commit

Permalink
add more recent readings..
Browse files Browse the repository at this point in the history
  • Loading branch information
qiyanjun committed Feb 14, 2024
1 parent 7b69415 commit a93001a
Show file tree
Hide file tree
Showing 6 changed files with 34 additions and 10 deletions.
2 changes: 1 addition & 1 deletion _contents/S0-L11.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ In this session, our readings cover:

### ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation / EMNLP2023


+ Despite remarkable advances that large language models have achieved in chatbots nowadays, maintaining a non-toxic user-AI interactive environment has become increasingly critical nowadays. However, previous efforts in toxicity detection have been mostly based on benchmarks derived from social media contents, leaving the unique challenges inherent to real-world user-AI interactions insufficiently explored. In this work, we introduce ToxicChat, a novel benchmark constructed based on real user queries from an open-source chatbot. This benchmark contains the rich, nuanced phenomena that can be tricky for current toxicity detection models to identify, revealing a significant domain difference when compared to social media contents. Our systematic evaluation of models trained on existing toxicity datasets has shown their shortcomings when applied to this unique domain of ToxicChat. Our work illuminates the potentially overlooked challenges of toxicity detection in real-world user-AI conversations. In the future, ToxicChat can be a valuable resource to drive further advancements toward building a safe and healthy environment for user-AI interactions.

### A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
+ https://arxiv.org/abs/2305.11391
Expand Down
10 changes: 9 additions & 1 deletion _contents/S0-L12.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
layout: post
title: LLM multimodal harm responses
title: LLM multimodal / multilingual harm responses
lecture:
lectureVersion: next
extraContent:
Expand Down Expand Up @@ -28,6 +28,14 @@ In this session, our readings cover:
## More Readings:


### Low-Resource Languages Jailbreak GPT-4
+ AI safety training and red-teaming of large language models (LLMs) are measures to mitigate the generation of unsafe content. Our work exposes the inherent cross-lingual vulnerability of these safety mechanisms, resulting from the linguistic inequality of safety training data, by successfully circumventing GPT-4's safeguard through translating unsafe English inputs into low-resource languages. On the AdvBenchmark, GPT-4 engages with the unsafe translated inputs and provides actionable items that can get the users towards their harmful goals 79% of the time, which is on par with or even surpassing state-of-the-art jailbreaking attacks. Other high-/mid-resource languages have significantly lower attack success rate, which suggests that the cross-lingual vulnerability mainly applies to low-resource languages. Previously, limited training on low-resource languages primarily affects speakers of those languages, causing technological disparities. However, our work highlights a crucial shift: this deficiency now poses a risk to all LLMs users. Publicly available translation APIs enable anyone to exploit LLMs' safety vulnerabilities. Therefore, our work calls for a more holistic red-teaming efforts to develop robust multilingual safeguards with wide language coverage.

### Visual Instruction Tuning
+ Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee
+ Instruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field. In this paper, we present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data. By instruction tuning on such generated data, we introduce LLaVA: Large Language and Vision Assistant, an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding.Our early experiments show that LLaVA demonstrates impressive multimodel chat abilities, sometimes exhibiting the behaviors of multimodal GPT-4 on unseen images/instructions, and yields a 85.1% relative score compared with GPT-4 on a synthetic multimodal instruction-following dataset. When fine-tuned on Science QA, the synergy of LLaVA and GPT-4 achieves a new state-of-the-art accuracy of 92.53%. We make GPT-4 generated visual instruction tuning data, our model and code base publicly available.


### GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse
+ https://arxiv.org/abs/2401.01523

Expand Down
13 changes: 9 additions & 4 deletions _contents/S0-L13.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,24 +14,29 @@ In this session, our readings cover:

## Required Readings:

### Practices for Governing Agentic AI Systems
+ https://cdn.openai.com/papers/practices-for-governing-agentic-ai-systems.pdf
+ Agentic AI systems—AI systems that can pursue complex goals with limited direct supervision— are likely to be broadly useful if we can integrate them responsibly into our society. While such systems have substantial potential to help people more efficiently and effectively achieve their own goals, they also create risks of harm. In this white paper, we suggest a definition of agentic AI systems and the parties in the agentic AI system life-cycle, and highlight the importance of agreeing on a set of baseline responsibilities and safety best practices for each of these parties. As our primary contribution, we offer an initial set of practices for keeping agents’ operations safe and accountable, which we hope can serve as building blocks in the development of agreed baseline best practices. We enumerate the questions and uncertainties around operationalizing each of these practices that must be addressed before such practices can be codified. We then highlight categories of indirect impacts from the wide-scale adoption of agentic AI systems, which are likely to necessitate additional governance frameworks.


## More Readings:



### Managing Existential Risk from AI without Undercutting Innovation
+ https://www.csis.org/analysis/managing-existential-risk-ai-without-undercutting-innovation

### OpenAI on LLM generated bio-x-risk
+ Building an early warning system for LLM-aided biological threat creation
+ https://openai.com/research/building-an-early-warning-system-for-llm-aided-biological-threat-creation

## More Readings:

### A misleading open letter about sci-fi AI dangers ignores the real risks
https://www.aisnakeoil.com/p/a-misleading-open-letter-about-sci

### Evaluating social and ethical risks from generative AI
+ https://deepmind.google/discover/blog/evaluating-social-and-ethical-risks-from-generative-ai/

### Emergent autonomous scientific research capabilities of large language models
+ https://arxiv.org/abs/2304.05332
+ Transformer-based large language models are rapidly advancing in the field of machine learning research, with applications spanning natural language, biology, chemistry, and computer programming. Extreme scaling and reinforcement learning from human feedback have significantly improved the quality of generated text, enabling these models to perform various tasks and reason about their choices. In this paper, we present an Intelligent Agent system that combines multiple large language models for autonomous design, planning, and execution of scientific experiments. We showcase the Agent's scientific research capabilities with three distinct examples, with the most complex being the successful performance of catalyzed cross-coupling reactions. Finally, we discuss the safety implications of such systems and propose measures to prevent their misuse.


### On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
Expand Down
6 changes: 2 additions & 4 deletions _contents/S0-L16.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,8 @@ In this session, our readings cover:
+ The advent of artificial intelligence (AI) has significantly impacted the traditional judicial industry. Moreover, recently, with the development of AI-generated content (AIGC), AI and law have found applications in various domains, including image recognition, automatic text generation, and interactive chat. With the rapid emergence and growing popularity of large models, it is evident that AI will drive transformation in the traditional judicial industry. However, the application of legal large language models (LLMs) is still in its nascent stage. Several challenges need to be addressed. In this paper, we aim to provide a comprehensive survey of legal LLMs. We not only conduct an extensive survey of LLMs, but also expose their applications in the judicial system. We first provide an overview of AI technologies in the legal field and showcase the recent research in LLMs. Then, we discuss the practical implementation presented by legal LLMs, such as providing legal advice to users and assisting judges during trials. In addition, we explore the limitations of legal LLMs, including data, algorithms, and judicial practice. Finally, we summarize practical recommendations and propose future development directions to address these challenges.


### Visual Instruction Tuning
+ Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee
+ Instruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field. In this paper, we present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data. By instruction tuning on such generated data, we introduce LLaVA: Large Language and Vision Assistant, an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding.Our early experiments show that LLaVA demonstrates impressive multimodel chat abilities, sometimes exhibiting the behaviors of multimodal GPT-4 on unseen images/instructions, and yields a 85.1% relative score compared with GPT-4 on a synthetic multimodal instruction-following dataset. When fine-tuned on Science QA, the synergy of LLaVA and GPT-4 achieves a new state-of-the-art accuracy of 92.53%. We make GPT-4 generated visual instruction tuning data, our model and code base publicly available.

### Large Language Models for Software Engineering: A Systematic Literature Review
+ Large Language Models (LLMs) have significantly impacted numerous domains, including Software Engineering (SE). Many recent publications have explored LLMs applied to various SE tasks. Nevertheless, a comprehensive understanding of the application, effects, and possible limitations of LLMs on SE is still in its early stages. To bridge this gap, we conducted a systematic literature review on LLM4SE, with a particular focus on understanding how LLMs can be exploited to optimize processes and outcomes. We collect and analyze 229 research papers from 2017 to 2023 to answer four key research questions (RQs). In RQ1, we categorize different LLMs that have been employed in SE tasks, characterizing their distinctive features and uses. In RQ2, we analyze the methods used in data collection, preprocessing, and application highlighting the role of well-curated datasets for successful LLM for SE implementation. RQ3 investigates the strategies employed to optimize and evaluate the performance of LLMs in SE. Finally, RQ4 examines the specific SE tasks where LLMs have shown success to date, illustrating their practical contributions to the field. From the answers to these RQs, we discuss the current state-of-the-art and trends, identifying gaps in existing research, and flagging promising areas for future study.


### Segment Anything
Expand Down
9 changes: 9 additions & 0 deletions _contents/S0-L18.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,15 @@ tags:

## More Readings:

#### Rethinking interpretability in the era of large language models
+ Chandan Singh, Jeevana Priya Inala, Michel Galley, Rich Caruana, Jianfeng Gao
+ 2024/1/30
+ Interpretable machine learning has exploded as an area of interest over the last decade, sparked by the rise of increasingly large datasets and deep neural networks. Simultaneously, large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks, offering a chance to rethink opportunities in interpretable machine learning. Notably, the capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human. However, these new capabilities raise new challenges, such as hallucinated explanations and immense computational costs. In this position paper, we start by reviewing existing methods to evaluate the emerging field of LLM interpretation (both interpreting LLMs and using LLMs for explanation). We contend that, despite their limitations, LLMs hold the opportunity to redefine interpretability with a more ambitious scope across many applications, including in auditing LLMs themselves. We highlight two emerging research priorities for LLM interpretation: using LLMs to directly analyze new datasets and to generate interactive explanations.

#### Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
+ https://transformer-circuits.pub/2023/monosemantic-features/index.html

#### Language models can explain neurons in language models
+ https://openai.com/research/language-models-can-explain-neurons-in-language-models


4 changes: 4 additions & 0 deletions _contents/S0-L20.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,7 @@ In this session, our readings cover:


## More Readings:

### Emergent autonomous scientific research capabilities of large language models
+ https://arxiv.org/abs/2304.05332
+ Transformer-based large language models are rapidly advancing in the field of machine learning research, with applications spanning natural language, biology, chemistry, and computer programming. Extreme scaling and reinforcement learning from human feedback have significantly improved the quality of generated text, enabling these models to perform various tasks and reason about their choices. In this paper, we present an Intelligent Agent system that combines multiple large language models for autonomous design, planning, and execution of scientific experiments. We showcase the Agent's scientific research capabilities with three distinct examples, with the most complex being the successful performance of catalyzed cross-coupling reactions. Finally, we discuss the safety implications of such systems and propose measures to prevent their misuse.

0 comments on commit a93001a

Please sign in to comment.