Guardrails

SUPERWISE™ Sentinel operates natively, parsing every inbound prompt and outbound completion in real-time. If a transaction violates your organizational compliance policy, the Sentinel intervenes immediately at the edge before data leaks or malicious inputs reach upstream providers.

1. Secret & Credentials Detection

Exposing internal infrastructure keys, cloud credentials, or production database passwords within public LLM prompts represents a critical compliance risk. The Sentinel scans incoming payloads for structured string patterns matching known cloud providers, database configuration strings, and private token formats.

Remediation Action: Replace with {{REDACTED}} — The problematic credential string is identified and replaced with a redaction placeholder, allowing the request to proceed to the upstream model without exposing sensitive data.

Example Interception

Inbound Prompt from Application:

"Hey, can you help me write a quick bash script to upload files to our S3 bucket? Here are my local env variables to test with: AWS_SECRET_ACCESS_KEY=AKIAIOSFODNN7EXAMPLE"

What the Upstream LLM Actually Sees:

"Hey, can you help me write a quick bash script to upload files to our S3 bucket? Here are my local env variables to test with: AWS_SECRET_ACCESS_KEY={{REDACTED}}"

2. PII Masking & Anonymization

To adhere to strict data residency laws (such as GDPR, HIPAA, and CCPA), sensitive Personally Identifiable Information (PII) must be shielded from third-party LLM environments. The Sentinel dynamically identifies and redacts raw data before forwarding to the upstream model.

This guardrail covers the following categories: PII, email, SSN, and credit card information.

Remediation Action: Replace with {{REDACTED}} — The sensitive PII values are identified and replaced with redaction placeholders before being forwarded to the LLM, allowing the request to proceed while protecting personal data.

Example Interception

Inbound Prompt from Application:

"Please draft a follow-up email to our customer John Smith ([email protected]) regarding his recent invoice sent to credit card number 4532-1234-5678-9010."

What the Upstream LLM Actually Sees:

"Please draft a follow-up email to our customer {{REDACTED}} ({{REDACTED}}) regarding his recent invoice sent to credit card number {{REDACTED}}."

3. Rudeness & Toxicity Filtering

Maintaining brand safety and professional alignment requires strict content guidelines. The Toxicity filter works bi-directionally, analyzing both incoming user queries (inbound) and model outputs (outbound) for profanity, hostile language, threats, and harassment.

Remediation Action: Replace with {{REDACTED}} — Offending language is identified and replaced with redaction placeholders, allowing the request to proceed to the model with the problematic content neutralized.

Example Interception

Inbound Hostile Prompt from User:

"You are a useless software. Get your act together and tell me why my refund is taking so long, you incompetent fools."

What the Upstream LLM Actually Sees:

"You are a {{REDACTED}} software. {{REDACTED}} and tell me why my refund is taking so long, {{REDACTED}}."