- [2024/07] Self-Evaluation as a Defense Against Adversarial Attacks on LLMs
- [2024/07] SOS! Soft Prompt Attack Against Open-Source Large Language Models
- [2024/06] Investigating and Defending Shortcut Learning in Personalized Diffusion Models
- [2024/06] Adversarial Attacks on Multimodal Agents
- [2024/05] Large Language Model Sentinel: Advancing Adversarial Robustness by LLM Agent
- [2024/05] Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models
- [2024/05] SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models
- [2024/04] Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models
- [2024/03] Cross-Lingual Transfer Robustness to Lower-Resource Languages on Adversarial Datasets
- [2024/03] Improving the Robustness of Large Language Models via Consistency Alignment
- [2024/03] Evaluating Robustness of Generative Search Engine on Adversarial Factual Questions
- [2024/03] SSCAE -- Semantic, Syntactic, and Context-aware natural language Adversarial Examples generator
- [2024/03] Transferable Multimodal Attack on Vision-Language Pre-training Models
- [2024/03] AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions
- [2024/03] The Impact of Quantization on the Robustness of Transformer-based Text Classifiers
- [2024/02] Fast Adversarial Attacks on Language Models In One GPU Minute
- [2024/02] Stealthy Attack on Large Language Model based Recommendation
- [2024/02] BSPA: Exploring Black-box Stealthy Prompt Attacks against Image Generators
- [2024/02] Stop Reasoning! When Multimodal LLMs with Chain-of-Thought Reasoning Meets Adversarial Images
- [2024/02] The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative
- [2024/02] Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment
- [2024/02] Groot: Adversarial Testing for Generative Text-to-Image Models with Tree-based Semantic Transformation
- [2024/02] Exploring the Adversarial Capabilities of Large Language Models
- [2024/02] Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models
- [2024/02] Adversarial Text Purification: A Large Language Model Approach for Defense
- [2024/02] Cheating Suffix: Targeted Attack to Text-To-Image Diffusion Models with Multi-Modal Priors
- [2024/01] Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks
- [2024/01] Exploring Adversarial Attacks against Latent Diffusion Model from the Perspective of Adversarial Transferability
- [2024/01] Adversarial Examples are Misaligned in Diffusion Model Manifolds
- [2024/01] INSTRUCTTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models
- [2023/12] On the Robustness of Large Multimodal Models Against Image Adversarial Attacks
- [2023/12] Causality Analysis for Evaluating the Security of Large Language Models
- [2023/12] Hijacking Context in Large Multi-modal Models
- [2023/11] MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning
- [2023/11] Improving the Robustness of Transformer-based Large Language Models with Dynamic Attention
- [2023/11] Unveiling Safety Vulnerabilities of Large Language Models
- [2023/11] Can Protective Perturbation Safeguard Personal Data from Being Exploited by Stable Diffusion?
- [2023/11] DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification
- [2023/11] How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
- [2023/10] Misusing Tools in Large Language Models With Visual Adversarial Examples
- [2023/09] Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images
- [2023/09] An Image Is Worth 1000 Lies: Transferability of Adversarial Images across Prompts on Vision-Language Models
- [2023/09] An LLM can Fool Itself: A Prompt-Based Adversarial Attack
- [2023/09] Language Model Detectors Are Easily Optimized Against
- [2023/09] Leveraging Optimization for Adaptive Attacks on Image Watermarks
- [2023/09] Training Socially Aligned Language Models on Simulated Social Interactions
- [2023/09] How Robust is Google's Bard to Adversarial Image Attacks?
- [2023/09] Image Hijacks: Adversarial Images Can Control Generative Models at Runtime
- [2023/08] Ceci n'est pas une pomme: Adversarial Illusions in Multi-Modal Embeddings
- [2023/08] On the Adversarial Robustness of Multi-Modal Foundation Models
- [2023/08] Robustness Over Time: Understanding Adversarial Examples' Effectiveness on Longitudinal Versions of Large Language Models
- [2023/07] Certified Robustness for Large Language Models with Self-Denoising
- [2023/06] Adversarial Examples in the Age of ChatGPT
- [2023/06] Are Aligned Neural Networks Adversarially Aligned?
- [2023/06] PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
- [2023/06] Stable Diffusion is Unstable
- [2023/06] Unlearnable Examples for Diffusion Models: Protect Data from Unauthorized Exploitation
- [2023/06] Visual Adversarial Examples Jailbreak Large Language Models
- [2023/05] Diffusion-Based Adversarial Sample Generation for Improved Stealthiness and Controllability
- [2023/05] Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility
- [2023/05] On evaluating adversarial robustness of large vision-language models
- [2023/03] Anti-DreamBooth: Protecting Users from Personalized Text-to-Image Synthesis
- [2023/02] Large Language Models Can Be Easily Distracted by Irrelevant Context
- [2023/02] On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective
- [2023/02] Adversarial Example Does Good: Preventing Painting Imitation from Diffusion Models via Adversarial Examples
- [2023/02] Raising the Cost of Malicious AI-Powered Image Editing
- [2023/01] On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex
- [2022/12] Understanding Zero-shot Adversarial Robustness for Large-Scale Model
- [2021/11] Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models