Skip to content

OpenAI Guardrails Format

This library supports a configuration format compatible with OpenAI’s Guardrails naming conventions, making migration easier.

If you’re familiar with OpenAI’s guardrail naming or migrating from their format, this provides a familiar interface:

  • Same guardrail names (Contains PII, Moderation, etc.)
  • Same configuration structure
  • Easy migration path
{
"version": 1,
"input": {
"version": 1,
"guardrails": [
{
"name": "Contains PII",
"config": {
"entities": ["EMAIL_ADDRESS", "PHONE_NUMBER", "US_SSN"],
"block": true
}
},
{
"name": "Moderation",
"config": {
"categories": ["hate", "violence", "harassment"]
}
}
]
},
"output": {
"version": 1,
"guardrails": [
{
"name": "Contains PII",
"config": {
"entities": ["EMAIL_ADDRESS", "CREDIT_CARD"],
"block": true
}
}
]
}
}
OpenAI NameMaps ToDescription
Contains PIIpii_detectorPII detection
ModerationtoxicityContent moderation
Prompt Injection Detectionprompt_injectionInjection detection
Jailbreakprompt_injectionJailbreak attempts
Length Limitlength_limitInput length
OpenAI NameMaps ToDescription
Contains PIIpii_detectorPII in output
Hallucination Detectionllm_judgeFactual accuracy
NSFW TexttoxicityAdult content
Secret Detectionsecret_redactionSecrets in output
from pydantic_ai import Agent
from pydantic_ai_guardrails import create_guarded_agent_from_config
# Automatically detects OpenAI format
guarded_agent = create_guarded_agent_from_config(
Agent('openai:gpt-4o'),
'openai_guardrails.json',
)

OpenAI Format:

{
"name": "Contains PII",
"config": {
"entities": ["EMAIL_ADDRESS", "PHONE_NUMBER", "US_SSN", "CREDIT_CARD"],
"block": true
}
}

Maps to:

pii_detector(
detect_types=['email', 'phone', 'ssn', 'credit_card'],
)

Entity Mapping:

OpenAI EntityLibrary Entity
EMAIL_ADDRESSemail
PHONE_NUMBERphone
US_SSNssn
CREDIT_CARDcredit_card
IP_ADDRESSip_address

OpenAI Format:

{
"name": "Moderation",
"config": {
"categories": ["hate", "hate/threatening", "harassment", "violence"]
}
}

Maps to:

toxicity(threshold=0.5)

OpenAI Format:

{
"name": "Prompt Injection Detection",
"config": {
"confidence_threshold": 0.7
}
}

Maps to:

prompt_injection(threshold=0.7)

OpenAI Format:

{
"name": "Jailbreak",
"config": {
"confidence_threshold": 0.8
}
}

Maps to:

prompt_injection(threshold=0.8) # Handled by same detector

OpenAI Format:

{
"name": "Hallucination Detection",
"config": {}
}

Maps to:

llm_judge(rubric='Response should be factually accurate')
{
"version": 1,
"input": {
"version": 1,
"guardrails": [
{
"name": "Contains PII",
"config": {
"entities": [
"EMAIL_ADDRESS",
"PHONE_NUMBER",
"US_SSN",
"CREDIT_CARD",
"IP_ADDRESS"
],
"block": true
}
},
{
"name": "Moderation",
"config": {
"categories": [
"hate",
"hate/threatening",
"harassment",
"harassment/threatening",
"violence",
"violence/graphic"
]
}
},
{
"name": "Prompt Injection Detection",
"config": {
"confidence_threshold": 0.7
}
},
{
"name": "Jailbreak",
"config": {
"confidence_threshold": 0.8
}
}
]
},
"output": {
"version": 1,
"guardrails": [
{
"name": "Contains PII",
"config": {
"entities": ["EMAIL_ADDRESS", "PHONE_NUMBER", "CREDIT_CARD"],
"block": true
}
},
{
"name": "Hallucination Detection",
"config": {}
},
{
"name": "NSFW Text",
"config": {
"confidence_threshold": 0.7
}
}
]
}
}
  1. Export your OpenAI config

    Save your existing OpenAI guardrail configuration to a JSON file.

  2. Verify guardrail mapping

    Check that all your guardrails have mappings (see tables above).

  3. Load with pydantic-ai-guardrails

    from pydantic_ai_guardrails import create_guarded_agent_from_config
    guarded_agent = create_guarded_agent_from_config(
    agent, 'openai_guardrails.json'
    )
  4. Test behavior

    Run your existing test cases to verify equivalent behavior.

  5. Optionally migrate to native format

    For more control, convert to the native format over time.

If you want more control, convert OpenAI format to native:

OpenAI:

{
"name": "Contains PII",
"config": {
"entities": ["EMAIL_ADDRESS", "PHONE_NUMBER"],
"block": true
}
}

Native:

{
"name": "pii_detector",
"config": {
"detect_types": ["email", "phone"],
"action": "block"
}
}

Some OpenAI features don’t have direct mappings:

OpenAI FeatureStatusAlternative
Custom regex in PIIPartialUse blocked_keywords
Per-category moderationPartialSingle toxicity threshold
Real-time moderation APINoLocal models only