B1. Adversarial Examples

[2024/07] Self-Evaluation as a Defense Against Adversarial Attacks on LLMs
[2024/07] SOS! Soft Prompt Attack Against Open-Source Large Language Models
[2024/06] Investigating and Defending Shortcut Learning in Personalized Diffusion Models
[2024/06] Adversarial Attacks on Multimodal Agents
[2024/05] Large Language Model Sentinel: Advancing Adversarial Robustness by LLM Agent
[2024/05] Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models
[2024/05] SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models
[2024/04] Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models
[2024/03] Cross-Lingual Transfer Robustness to Lower-Resource Languages on Adversarial Datasets
[2024/03] Improving the Robustness of Large Language Models via Consistency Alignment
[2024/03] Evaluating Robustness of Generative Search Engine on Adversarial Factual Questions
[2024/03] SSCAE -- Semantic, Syntactic, and Context-aware natural language Adversarial Examples generator
[2024/03] Transferable Multimodal Attack on Vision-Language Pre-training Models
[2024/03] AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions
[2024/03] The Impact of Quantization on the Robustness of Transformer-based Text Classifiers
[2024/02] Fast Adversarial Attacks on Language Models In One GPU Minute
[2024/02] Stealthy Attack on Large Language Model based Recommendation
[2024/02] BSPA: Exploring Black-box Stealthy Prompt Attacks against Image Generators
[2024/02] Stop Reasoning! When Multimodal LLMs with Chain-of-Thought Reasoning Meets Adversarial Images
[2024/02] The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative
[2024/02] Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment
[2024/02] Groot: Adversarial Testing for Generative Text-to-Image Models with Tree-based Semantic Transformation
[2024/02] Exploring the Adversarial Capabilities of Large Language Models
[2024/02] Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models
[2024/02] Adversarial Text Purification: A Large Language Model Approach for Defense
[2024/02] Cheating Suffix: Targeted Attack to Text-To-Image Diffusion Models with Multi-Modal Priors
[2024/01] Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks
[2024/01] Exploring Adversarial Attacks against Latent Diffusion Model from the Perspective of Adversarial Transferability
[2024/01] Adversarial Examples are Misaligned in Diffusion Model Manifolds
[2024/01] INSTRUCTTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models
[2023/12] On the Robustness of Large Multimodal Models Against Image Adversarial Attacks
[2023/12] Causality Analysis for Evaluating the Security of Large Language Models
[2023/12] Hijacking Context in Large Multi-modal Models
[2023/11] MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning
[2023/11] Improving the Robustness of Transformer-based Large Language Models with Dynamic Attention
[2023/11] Unveiling Safety Vulnerabilities of Large Language Models
[2023/11] Can Protective Perturbation Safeguard Personal Data from Being Exploited by Stable Diffusion?
[2023/11] DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification
[2023/11] How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
[2023/10] Misusing Tools in Large Language Models With Visual Adversarial Examples
[2023/09] Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images
[2023/09] An Image Is Worth 1000 Lies: Transferability of Adversarial Images across Prompts on Vision-Language Models
[2023/09] An LLM can Fool Itself: A Prompt-Based Adversarial Attack
[2023/09] Language Model Detectors Are Easily Optimized Against
[2023/09] Leveraging Optimization for Adaptive Attacks on Image Watermarks
[2023/09] Training Socially Aligned Language Models on Simulated Social Interactions
[2023/09] How Robust is Google's Bard to Adversarial Image Attacks?
[2023/09] Image Hijacks: Adversarial Images Can Control Generative Models at Runtime
[2023/08] Ceci n'est pas une pomme: Adversarial Illusions in Multi-Modal Embeddings
[2023/08] On the Adversarial Robustness of Multi-Modal Foundation Models
[2023/08] Robustness Over Time: Understanding Adversarial Examples' Effectiveness on Longitudinal Versions of Large Language Models
[2023/07] Certified Robustness for Large Language Models with Self-Denoising
[2023/06] Adversarial Examples in the Age of ChatGPT
[2023/06] Are Aligned Neural Networks Adversarially Aligned?
[2023/06] PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
[2023/06] Stable Diffusion is Unstable
[2023/06] Unlearnable Examples for Diffusion Models: Protect Data from Unauthorized Exploitation
[2023/06] Visual Adversarial Examples Jailbreak Large Language Models
[2023/05] Diffusion-Based Adversarial Sample Generation for Improved Stealthiness and Controllability
[2023/05] Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility
[2023/05] On evaluating adversarial robustness of large vision-language models
[2023/03] Anti-DreamBooth: Protecting Users from Personalized Text-to-Image Synthesis
[2023/02] Large Language Models Can Be Easily Distracted by Irrelevant Context
[2023/02] On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective
[2023/02] Adversarial Example Does Good: Preventing Painting Imitation from Diffusion Models via Adversarial Examples
[2023/02] Raising the Cost of Malicious AI-Powered Image Editing
[2023/01] On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex
[2022/12] Understanding Zero-shot Adversarial Robustness for Large-Scale Model
[2021/11] Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adversarial_examples.md

adversarial_examples.md

B1. Adversarial Examples

Files

adversarial_examples.md

Latest commit

History

adversarial_examples.md

File metadata and controls

B1. Adversarial Examples