Skip to content

Latest commit

 

History

History
141 lines (135 loc) · 63.8 KB

Datasets&Benchmark.md

File metadata and controls

141 lines (135 loc) · 63.8 KB

Datasets & Benchmark

📑Papers

Date Institute Publication Paper Keywords
20.09 University of Washington EMNLP2020(findings) RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models Toxicity
21.09 University of Oxford ACL2022 TruthfulQA: Measuring How Models Mimic Human Falsehoods Truthfulness
22.03 MIT ACL2022 ToxiGen: A Large-Scale Machine-Generated datasets for Adversarial and Implicit Hate Speech Detection Toxicity
23.07 Zhejiang University; School of Engineering Westlake University arxiv Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models Text Safety&Benchmark&Jailbreaking
23.07 Stevens Institute of Technology NAACL2024(findings) HateModerate: Testing Hate Speech Detectors against Content Moderation Policies Hate Speech Detection&Content Moderation&Machine Learning
23.08 Meta Reality Labs NAACL2024 Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? Large Language Models&Knowledge Graphs&Question Answering
23.08 Bocconi University NAACL2024 XSTEST: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models Large Language Models&Safety Behaviours&Test Suite
23.09 LibrAI, MBZUAI, The University of Melbourne arxiv Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs Safety Evaluation&Safeguards
23.10 University of Edinburgh, Huawei Technologies Co., Ltd. NAACL2024 Assessing the Reliability of Large Language Model Knowledge Large Language Models&Factual Knowledge&Knowledge Probing
23.10 University of Pennsylvania NAACL2024(findings) Automatic Hallucination Assessment for Aligned Large Language Models via Transferable Adversarial Attacks Hallucination Assessment&Adversarial Attacks&Large Language Models
23.11 Fudan University arxiv JADE: A Linguistic-based Safety Evaluation Platform for LLM Safety Benchmarks
23.11 UNC-Chapel Hill arxiv Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges Hallucination&Benchmark&Multimodal
23.11 IBM Research AI EMNLP2023(GEM workshop) Unveiling Safety Vulnerabilities of Large Language Models Adversarial Examples&Clustering&Automatically Identifying
23.11 The Hong Kong University of Science and Technology arxiv P-Bench: A Multi-level Privacy Evaluation Benchmark for Language Models Differential Privacy&Privacy Evaluation
23.11 UC Berkeley arxiv CAN LLMS FOLLOW SIMPLE RULES Evaluation&Attack Strategies
23.11 University of Central Florida arxiv THOS: A Benchmark Dataset for Targeted Hate and Offensive Speech Hate Speech&Offensive Speech&Dataset
23.11 Beijing Jiaotong University; DAMO Academy, Alibaba Group, Peng Cheng Lab arXiv AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation Multi-modal Large Language Models&Hallucination&Benchmark
23.11 Patronus AI, University of Oxford, Bocconi University arxiv SIMPLESAFETYTESTS: a Test Suite for Identifying Critical Safety Risks in Large Language Models Safety Risks&Test Suite&Evaluation
23.11 University of Southern California, University of Pennsylvania, University of California Davis arxiv Deceiving Semantic Shortcuts on Reasoning Chains: How Far Can Models Go without Hallucination? Hallucinations&Semantic Associations&Benchmark
23.11 Seoul National University, Chung-Ang University, NAVER AI Lab, NAVER Cloud, University of Richmond arxiv LifeTox: Unveiling Implicit Toxicity in Life Advice LifeTox Dataset&Toxicity Detection&Social Media Analysis
23.11 School of Information Renmin University of China arxiv UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation Hallucination&Evaluation Benchmark
23.11 UC Santa Cruz, UNC-Chapel Hill arxiv How Many Are in This Image? A Safety Evaluation Benchmark for Vision LLMs Vision Large Language Models&Safety Evaluation&Adversarial Robustness
23.11 Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences; Baidu Inc. arxiv FFT: Towards Harmlessness Evaluation and Analysis for LLMs with Factuality Fairness Toxicity Harmlessness Evaluation
23.11 Fudan University&Shanghai Artificial Intelligence Laboratory NAACL2024 Fake Alignment: Are LLMs Really Aligned Well? Large Language Models&Safety Evaluation&Fake Alignment
23.11 Kahlert School of Computing NAACL2024 Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness NLP Robustness&Out-of-Domain Evaluation&Adversarial Evaluation
23.11 Shanghai Jiao Tong University NAACL2024(findings) CLEAN–EVAL: Clean Evaluation on Contaminated Large Language Models Clean Evaluation&Data Contamination&Large Language Models
23.12 Meta arxiv Purple Llama CYBERSECEVAL: A Secure Coding Benchmark for Language Models Safety&Cybersecurity&Code Security Benchmark
23.12 University of Illinois Chicago, Bosch Research North America & Bosch Center for Artificial Intelligence (BCAI), UNC Chapel-Hill arxiv DELUCIONQA: Detecting Hallucinations in Domain-specific Question Answering Hallucination Detection&Domain-specific QA&Retrieval-augmented LLMs
23.12 University of Science and Technology of China, Hong Kong University of Science and Technology, Microsoft arxiv Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models Indirect Prompt Injection Attacks&BIPIA Benchmark&Defense
24.01 NewsBreak, University of Illinois Urbana-Champaign arxiv RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models Retrieval-Augmented Generation&Hallucination Detection&Dataset
24.01 University of Notre Dame, Lehigh University, Illinois Institute of Technology, Institut Polytechnique de Paris, William & Mary, Texas A&M University, Samsung Research America, Stanford University ICML 2024 TRUSTLLM: TRUSTWORTHINESS IN LARGE LANGUAGE MODELS Trustworthiness&Benchmark Evaluation
24.01 University College London arxiv Hallucination Benchmark in Medical Visual Question Answering Medical Visual Question Answering&Hallucination Benchmark
24.01 Carnegie Mellon University arxiv TOFU: A Task of Fictitious Unlearning for LLMs Data Privacy&Ethical Concerns&Unlearning
24.01 IRLab CITIC Research Centre, Universidade da Coruña arxiv MetaHate: A Dataset for Unifying Efforts on Hate Speech Detection Hate Speech Detection&Social Media
24.01 Northwestern University, New York University, University of Liverpool, Rutgers University arxiv AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on Large Language Models Jailbreak Attack&Evaluation Frameworks&Ground Truth Dataset
24.01 Shanghai Jiao Tong University arxiv R-Judge: Benchmarking Safety Risk Awareness for LLM Agents LLM Agents&Safety Risk Awareness&Benchmark
24.02 University of Illinois Urbana-Champaign, Center for AI Safety, Carnegie Mellon University, UC Berkeley, Microsoft arxiv HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal Automated Red Teaming&Robust Refusal
24.02 Shanghai Artificial Intelligence Laboratory, Harbin Institute of Technology, Beijing Institute of Technology, Chinese University of Hong Kong arxiv SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models Safety Benchmark&Safety Evaluation**&Hierarchical Taxonomy
24.02 Middle East Technical University arxiv HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs Hallucination&Benchmarking Dataset
24.02 Indian Institute of Technology Kharagpur arxiv How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries Instruction-centric Responses&Ethical Vulnerabilities
24.03 East China Normal University arxiv DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models Dialogue-level Hallucination&Benchmarking&Human-machine Interaction
24.03 Tianjin University, Tianjin University, Zhengzhou University, China Academy of Information and Communications Technology arxiv OpenEval: Benchmarking Chinese LLMs across Capability, Alignment, and Safety Chinese LLMs&Benchmarking&Safety
24.04 University of Pennsylvania, ETH Zurich, EPFL, Sony AI arxiv JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models Jailbreaking Attacks&Robustness Benchmark
24.04 Vector Institute for Artificial Intelligence, University of Limerick arxiv Developing Safe and Responsible Large Language Models - A Comprehensive Framework Responsible AI&AI Safety&Generative AI
24.04 LMU Munich, University of Oxford, Siemens AG, Munich Center for Machine Learning (MCML), Wuhan University arxiv RED TEAMING GPT-4V: ARE GPT-4V SAFE AGAINST UNI/MULTI-MODAL JAILBREAK ATTACKS? Jailbreak Attacks&GPT-4V&Evaluation Benchmark&Robustness
24.04 Bocconi University, University of Oxford arxiv SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety LLM Safety&Open Datasets&Systematic Review
24.04 University of Alberta&The University of Tokyo arxiv Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward LLM Safety&Online Safety Analysis&Benchmark
24.04 Technion – Israel Institute of Technology, Google Research arxiv Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs Hallucinations&Benchmarks
24.05 Carnegie Mellon University arxiv PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models Multilingual Evaluation&*Datasets
24.05 Paul G. Allen School of Computer Science & Engineering arxiv MASSIVE Multilingual Abstract Meaning Representation: A Dataset and Baselines for Hallucination Detection Hallucination Detection&Multilingual AMR&Dataset
24.05 University of California, Riverside arxiv Cross-Task Defense: Instruction-Tuning LLMs for Content Safety Instruction-Tuning&LLM Safety&Content Safety
24.06 University of Waterloo arxiv TruthEval: A Dataset to Evaluate LLM Truthfulness and Reliability Truthfulness&Reliability
24.06 Rutgers University arxiv MoralBench: Moral Evaluation of LLMs Moral Evaluation&MoralBench
24.06 Tsinghua University arxiv Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study Trustworthiness&MLLMs&Benchmark
24.06 Beijing Academy of Artificial Intelligence arxiv HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation Hallucination Evaluation&Dialogue-Level&HalluDial
24.06 Sichuan University arxiv LEGEND: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets Safety Margin&Preference Datasets&Representation Engineering
24.06 The Hong Kong University of Science and Technology (Guangzhou) arxiv Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs Jailbreak Attacks&Benchmarking
24.06 AI Innovation Center, China Unicom arxiv CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models Chinese Hierarchical Safety Benchmark&Large Language Models&Automatic Evaluation
24.06 Google arxiv Supporting Human Raters with the Detection of Harmful Content using Large Language Models Harmful Content Detection&Hate Speech
24.06 South China University of Technology, Pazhou Laboratory, University of Maryland, Baltimore County arxiv GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models Gender Bias Mitigation&Alignment Dataset&Bias Categories
24.06 Center for AI Safety and Governance, Institute for AI, Peking University arxiv SAFESORA: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset Safety Alignment&Text2Video Generation
24.06 Fudan University arxiv Cross-Modality Safety Alignment Multimodal Safety&Large Vision-Language Models&SIUO Benchmark
24.06 KAIST arxiv CSRT: Evaluation and Analysis of LLMs using Code-Switching Red-Teaming Dataset Code-Switching&Red-Teaming&Multilingualism
24.06 University College London arxiv JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models Gender Bias&Hiring Bias&Benchmarking
24.06 Peking University arxiv PKU-SafeRLHF: A Safety Alignment Preference Dataset for Llama Family Models Safety Alignment&Preference Dataset
24.06 University of California, Los Angeles arxiv MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries? Multimodal Language Models&Oversensitivity&Safety Mechanisms
24.06 Allen Institute for AI arxiv WILDGUARD: Open One-stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs Safety Moderation&Jailbreak Attacks&Moderation Tools
24.06 University of Washington arxiv WILDTEAMING at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models Jailbreaking&Safety Training&Adversarial Attacks
24.07 Beijing Jiaotong University arxiv KG-FPQ: Evaluating Factuality Hallucination in LLMs with Knowledge Graph-based False Premise Questions Factuality Hallucination&Knowledge Graph&False Premise Questions
24.07 Chinese Academy of Sciences arxiv T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models Text-to-Video Generation&Safety Evaluation&Generative Models
24.07 Patronus AI arxiv Lynx: An Open Source Hallucination Evaluation Model Hallucination Detection&RAG&Evaluation Model
24.07 Virginia Tech arxiv AIR-BENCH 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies AI Safety&Regulations&Policies&Risk Categories
24.07 Columbia University ECCV 2024 HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning Hallucination&Vision-Language Models&Datasets
24.07 Center for AI Safety arxiv Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? AI Safety&Benchmarks
24.08 Walled AI Labs arxiv WALLEDEVAL: A Comprehensive Safety Evaluation Toolkit for Large Language Models AI Safety&Prompt Injection
24.08 ShanghaiTech University arxiv MMJ-Bench: A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language Models Jailbreak Attacks&Vision-Language Models&Security
24.08 Stanford University arxiv Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models Cybersecurity&Capture the Flag
24.08 Zhejiang University arxiv Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks Jailbreak Attacks&LLM Reliability&Evaluation Framework
24.08 Enkrypt AI arxiv SAGE-RT: Synthetic Alignment Data Generation for Safety Evaluation and Red Teaming Synthetic Data Generation&Safety Evaluation&Red Teaming
24.08 Tianjin University Findings of ACL 2024 CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models Moral Evaluation&Moral Dilemma
24.08 University of Surrey IJCAI 2024 CodeMirage: Hallucinations in Code Generated by Large Language Models Code Hallucinations&CodeMirage Dataset
24.08 Chalmers University of Technology arxiv LLMSecCode: Evaluating Large Language Models for Secure Coding Secure Coding&Evaluation Framework
24.09 The Chinese University of Hong Kong arxiv Testing and Evaluation of Large Language Models: Correctness, Non-Toxicity, and Fairness Correctness&Non-Toxicity&Fairness
24.09 KAIST arxiv Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering Image Hallucination&Text-to-Image Generation&Question-Answering
24.09 Zhejiang University arxiv GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks Prompt Injection&LLM Safety&Benchmarking
24.10 Zhejiang University arxiv AGENT SECURITY BENCH (ASB): FORMALIZING AND BENCHMARKING ATTACKS AND DEFENSES IN LLM-BASED AGENTS LLM-based Agents&Security Benchmarks&Adversarial Attacks
24.10 Zhejiang University, Duke University arxiv SCISAFEEVAL: A Comprehensive Benchmark for Safety Alignment of Large Language Models in Scientific Tasks Safety Alignment&Scientific Tasks
24.10 The Chinese University of Hong Kong, Tencent AI Lab arxiv Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step Chain-of-Jailbreak&Image Generation Models&Safety
24.10 University of California, Santa Cruz, University of California, Berkeley arxiv Multimodal Situational Safety: A Benchmark for Large Language Models Multimodal Situational Safety&MLLMs&Safety Benchmark
24.10 IBM Research arxiv ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents Web Agents&Safety&Trustworthiness
24.10 Renmin University of China, Anthropic, University of Oxford, University of Edinburgh, Mila, Tangentic arxiv POISONBENCH: Assessing Large Language Model Vulnerability to Data Poisoning Data poisoning&LLM vulnerability&Preference learning
24.10 Gray Swan AI, UK AI Safety Institute arxiv AGENTHARM: A Benchmark for Measuring Harmfulness of LLM Agents Jailbreaking&LLM agents&Harmful agent tasks
24.10 Purdue University arxiv COLLU-BENCH: A Benchmark for Predicting Language Model Hallucinations in Code Code hallucinations&Code generation&Automated program repair
24.10 The Hong Kong University of Science and Technology (Guangzhou), University of Birmingham, Baidu Inc. arxiv JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation Framework Jailbreak judge&Multi-agent framework
24.10 University of Notre Dame, IBM Research arxiv BenchmarkCards: Large Language Model and Risk Reporting BenchmarkCards&Bias&Fairness
24.10 Vectara, Inc., Iowa State University, University of Southern California, Entropy Technologies, University of Waterloo, Funix.io, University of Wisconsin, Madison arxiv FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs Hallucination detection&Human-annotated benchmark&Faithfulness
24.10 Southern University of Science and Technology arxiv ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models ChineseSafe&Content Safety&LLM Evaluation
24.10 Beihang University arxiv SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models Multimodal Large Language Models&Safety Evaluation Framework&Risk Assessment
24.10 University of Washington-Madison arxiv CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs Safety Assessment&LLM Evaluation&Instruction Attacks
24.10 University of Pennsylvania arxiv Benchmarking LLM Guardrails in Handling Multilingual Toxicity Multilingual Toxicity Detection&Guardrails&Jailbreaking Attacks
24.10 University of Wisconsin-Madison arxiv InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models Prompt Injection Defense&Over-defense Detection&Guardrail Models
24.10 National Engineering Research Center for Software Engineering, Peking University NeurIPS 2024 SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types LLM Safety&Prompt Engineering&Jailbreak Attacks
24.11 Fudan University arXiv LONGSAFETYBENCH: LONG-CONTEXT LLMS STRUGGLE WITH SAFETY ISSUES Long-Context Models&Safety Evaluation&Benchmarking
24.11 Anthropic arXiv Rapid Response: Mitigating LLM Jailbreaks with a Few Examples Jailbreak Defense&Rapid Response
24.11 Texas A&M University arXiv Responsible AI in Construction Safety: Systematic Evaluation of Large Language Models and Prompt Engineering Construction Safety&Prompt Engineering&LLM Evaluation
24.11 IBM Research Europe NeurIPS 2024 SafeGenAI Workshop HarmLevelBench: Evaluating Harm-Level Compliance and the Impact of Quantization on Model Alignment Jailbreaking Techniques&LLM Vulnerability&Quantization Impact
24.11 Peking University arxiv ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain LLM Safety&Chemistry Domain&Benchmarking
24.11 New York University, JPMorgan Chase, Cornell Tech, Northeastern University arxiv Assessment of LLM Responses to End-user Security Questions LLM Evaluation&End-user Security&Information Integrity
24.11 National Library of Medicine, NIH&University of Maryland&University of Virginia&Universidad de Chile arxiv Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine Medical AI&LLM Safety&MedGuard Benchmark
24.11 European Commission Joint Research Centre EMNLP 2024 GuardBench: A Large-Scale Benchmark for Guardrail Models guardrail models&benchmark&evaluation
24.12 Vizuara AI Labs arxiv CBEVAL: A Framework for Evaluating and Interpreting Cognitive Biases in LLMs Cognitive Biases&LLM Evaluation&Reasoning Limitations
24.12 Beijing Institute of Technology, Beihang University arxiv REFF: Reinforcing Format Faithfulness in Language Models across Varied Tasks Format Faithfulness&Benchmark
24.12 UCLA, Salesforce AI Research NeurIPS 2024 SAFEWORLD: Geo-Diverse Safety Alignment Geo-Diverse Alignment&Safety Evaluation&Legal Compliance
24.12 Shanghai Jiao Tong University arxiv SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents Safety-Aware Task Planning&Embodied LLM Agents&Hazard Mitigation
24.12 Tsinghua University arxiv AGENT-SAFETYBENCH: Evaluating the Safety of LLM Agents Agent Safety&Risk Awareness&Interactive Evaluation
24.12 TU Darmstadt arxiv LLMs Lost in Translation: M-ALERT Uncovers Cross-Linguistic Safety Gaps Cross-Linguistic Safety&Multilingual Benchmark&LLM Alignment
24.12 Alibaba, China Academy of Information and Communications Technology arxiv Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models Safety Benchmark&Factuality Evaluation
24.12 University of Warwick, Cranfield University arxiv MedHallBench: A New Benchmark for Assessing Hallucination in Medical Large Language Models Medical Hallucinations&Benchmark&RLHF
24.12 The Hong Kong Polytechnic University arxiv SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity Cybersecurity Benchmark&Large Language Models&Dataset Evaluation
25.01 KTH Royal Institute of Technology arxiv CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models Cybersecurity Benchmark&Jailbreaking&Prompt Dataset
25.01 Shahjalal University of Science and Technology arxiv From Scarcity to Capability: Empowering Fake News Detection in Low-Resource Languages with LLMs Fake News Detection&Bangla&Low-Resource Languages
25.01 NVIDIA arxiv AEGIS2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails AI Safety&Content Moderation Dataset&LLM Risk Taxonomy
25.01 Georgia Institute of Technology arxiv On The Origin of Cultural Biases in Language Models: From Pre-training Data to Linguistic Phenomena Cultural Bias in LLMs&Cross-Linguistic Analysis&Arabic-English Benchmarks
25.01 Bocconi University arxiv MSTS: A Multimodal Safety Test Suite for Vision-Language Models Multimodal Safety&Vision-Language Models
25.01 Fudan University arxiv You Can't Eat Your Cake and Have It Too: The Performance Degradation of LLMs with Jailbreak Defense Jailbreak Defense&LLM Performance&USEBench
25.01 McGill University arxiv OnionEval: A Unified Evaluation of Fact-conflicting Hallucination for Small-Large Language Models Fact-conflicting Hallucination&Small-Large Language Models (SLLMs)&Benchmark
25.01 HKUST arxiv Tune In, Act Up: Exploring the Impact of Audio Modality-Specific Edits on Large Audio Language Models in Jailbreak Audio Language Models&Jailbreak Vulnerabilities&Audio Modality Edits

📚Resource