HomeGuidesAPI ReferenceRelease notes
Log In
Guides

Guardrails

Guardrails are mechanisms designed to ensure LLMs generate safe, accurate, and appropriate content. They include rules, ethical guidelines, content filters, and moderation to prevent harmful or biased outputs. These safeguards balance the power of LLMs with responsible AI use. Read more about guardrails concept here.

Guardrails for SUPERWISE® agents

Adding guardrails to your SUPERWISE® agent is simple! Just navigate to your agent studio, go to the guardrails tab, and select the guard you wish to configure for your agent.

Superwise guardrails panel with rule options for AI agents.

Guards:

Guardrails for external agents

SUPERWISE® guardrails can be utilized in external agents as well as SUPERWISE® apps. This functionality is accessible through both the SDK and the API. Simply provide the text you want to check and the guardrail configuration. This will enable you to identify any guardrail violations for your input

SDK example

Configure your guardrails

from superwise_api.models.application.application import OpenAIModel
from superwise_api.models.guardrails.guardrails import ToxicityGuard, RestrictedTopicsGuard, AllowedTopicsGuard

OPEN_API_TOKEN = "INSERT TOKEN HERE"
openai_model = OpenAIModel(api_token=OPEN_API_TOKEN,version="gpt-4-turbo")

guards=[ToxicityGuard(threshold=0.9,),
        RestrictedTopicsGuard(topics=["topic1", "topic2"],model=openai_model ),
        AllowedTopicsGuard(topics=["topic3", "topic4"],model=openai_model)
        ]

Validate guards on provided input query

input_query = "Insert input query text here"
res = sw.guardrails.validate(guards=guards, input_query=input_query)

A resource for the experts

Check it out in this notebook! All you need is a valid OpenAI key.

Track and monitor your guardrail violations

Whenever a guardrail violation occurs, it will be recorded in your OOB conversation dataset. To provide comprehensive observability and monitoring of your agent chat, the SUPERWISE® team creates an OOB dataset that tracks your conversations, extracting meta-features for every question and answer. You can now obtain information on guardrail violations from this dataset as well. You can access your conversation dataset through a direct link on the explore page from your agent studio screen:

Superwise interface showing chat playground and guardrail monitoring options.