Jailbreak detection
Jailbreak detection is based on the jailbreak-classifier open source model. It allows users to apply a detection mechanism to prevent attempts to bypass the model’s built-in boundaries and system prompts to generate unwanted or unexpected interactions. The higher the sensitivity parameter is configured, the greater the chance of this guardrail being activated and detecting violations.

Using the SDK
from superwise_api.models.agent.agent import AgentDetectJailbreakGuard
restricted_topics_guard = AgentDetectJailbreakGuard(
name="Rule name",
tags={"input"},
threshold=0.7
)
Updated about 16 hours ago