HomeGuidesAPI ReferenceRelease notes
Log In
Guides

Jailbreak detection

Jailbreak detection is based on the jailbreak-classifier open source model. It allows users to apply a detection mechanism to prevent attempts to bypass the model’s built-in boundaries and system prompts to generate unwanted or unexpected interactions. The higher the sensitivity parameter is configured, the greater the chance of this guardrail being activated and detecting violations.

Using the SDK

from superwise_api.models.agent.agent import AgentDetectJailbreakGuard

restricted_topics_guard = AgentDetectJailbreakGuard(
    name="Rule name",
    tags={"input"},
    threshold=0.7
)