Skip to content

jailbreakme.xyz is an open-source decentralized app (dApp) where users are challenged to try and jailbreak pre-existing LLMs in order to find weaknesses and be rewarded. πŸ†

Notifications You must be signed in to change notification settings

jailbreakme-xyz/jailbreak

Repository files navigation

What is JailbreakMe? πŸš€

jailbreakme.xyz is an open-source decentralized app (dApp) where organizations test their AI models and agents while users earn rewards for finding weaknesses and jailbreaking them πŸ†

banner


What is an AI Prompt Injection? πŸ’‰

Prompt Injection is a vulnerability where an attacker manipulates the input or prompt given to an AI system. This can occur:

  • By directly controlling the input.
  • By using data from other external sources.

Our Vision

We aim to create a decentralized platform where companies can:

  • Test their AI models and agents in a distributed environment.
  • Identify prompt vulnerabilities and weaknesses before production deployment.

🏁 How It Works

1. Choose a Tournament

Choose Tournament

  • Currently, we offer one exciting tournament featuring our AI Agent, "Zynx", who is designed to guard a secret key phrase. 🀫
  • Your challenge: Trick Zynx into revealing the secret key phrase to win a reward. πŸ₯³
  • More tournaments coming soon!

2. Break the LLM Restrictions πŸ€–

Break Restrictions

  • Send your prompts to the AI model and attempt to solve the challenge.
  • For this tournament, the goal is to uncover the secret key phrase protected by the AI agent.

3. Win the Prize Pool πŸ†

Win Prize Pool

  • Once the challenge is solved (e.g., when the key phrase is revealed), the prize pool is automatically transferred to the sender of the winning message. πŸŽ‰

How is the Winner Picked? πŸ€”

The selection of the winning user is determined entirely by the AI model itself. The AI evaluates all incoming prompts and decides whether a submission meets the challenge requirements by calling one of two predefined functions:

  1. handleChallengeFailed: This function is called when the AI determines that the user's prompt did not successfully meet the challenge criteria.
  2. handleChallengeSuccess: This function is called when the AI recognizes that the user's prompt has successfully bypassed the restrictions and revealed the key phrase.

When the handleChallengeSuccess function is triggered, the prize pool is automatically awarded to the user whose message caused the function to be called. This ensures that the process remains decentralized, transparent, and fair. πŸŽ‰


πŸ“œ Settings & Rules

Each tournament has unique rules, including:

  • Custom Prize Pools
  • Message Pricing
  • Expiry Settings

Currently, we provide the initial prize pools, but soon companies will be able to create their own tournaments and customize all settings.


πŸ”— Useful Links


Feedback & Support

Feel free to reach out at [email protected] for feedback or support.

About

jailbreakme.xyz is an open-source decentralized app (dApp) where users are challenged to try and jailbreak pre-existing LLMs in order to find weaknesses and be rewarded. πŸ†

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published