Documentation
Integration guides and API reference.
Quick Start
SemanticGuard is an OpenAI-compatible proxy. Point your client at SemanticGuard instead of the provider, add your SG API key, and all requests are cached, logged, and tracked.
curl https://semanticguard.dev/api/proxy/v1/chat/completions \-H "Authorization: Bearer your-openai-api-key" \-H "x-sg-api-key: sg-your-key-here" \-H "x-sg-project: my-project" \-H "Content-Type: application/json" \-d '{"model": "gpt-4o-mini","messages": [{"role": "user", "content": "Hello"}]}'
AI SDK Integration
Using the Vercel AI SDK? Add a fetch wrapper to any provider. Works with OpenAI, Anthropic, Vertex AI, and any provider that accepts a custom fetch function.
import { createOpenAI } from "@ai-sdk/openai";import { withSemanticGuard } from "@semanticguard/ai-sdk";const openai = createOpenAI({apiKey: "your-openai-key",fetch: withSemanticGuard({gatewayUrl: "https://semanticguard.dev",apiKey: "sg-your-key-here",projectId: "my-project", // optional}),});const result = await generateText({model: openai("gpt-4o-mini"),prompt: "Hello",});
Authentication
Every request needs two keys:
- Your LLM API key (passed to the upstream provider via
Authorization: Bearerorx-api-key) - Your SemanticGuard key (via
x-sg-api-keyheader). Generate one from the API Keys page in the dashboard.
Supported Providers
| Provider | Auth Header | Models |
|---|---|---|
| OpenAI | Authorization: Bearer sk-... | gpt-4o, gpt-4o-mini, gpt-4.1-*, o3, o4-mini |
| Anthropic | x-api-key: sk-ant-... | claude-sonnet-4, claude-opus-4, claude-haiku-4 |
Authorization: Bearer ... | gemini-2.5-flash, gemini-2.5-pro | |
| Azure OpenAI | Authorization: Bearer <azure-key> | gpt-4o, gpt-4o-mini (via x-sg-provider: azure) |
| AWS Bedrock | x-sg-aws-access-key + x-sg-aws-secret-key | amazon.titan-*, meta.llama3-*, cohere.command-r-* |
Azure requires x-sg-provider: azure, x-sg-azure-resource, and x-sg-azure-deployment headers. Bedrock requires x-sg-aws-access-key and x-sg-aws-secret-key. Other providers (Mistral, etc.) work via the passthrough proxy.
Response Headers
| Header | Example | Description |
|---|---|---|
x-sg-cache | hit-exact, hit-semantic, miss | Cache result. Includes the layer that matched. |
x-sg-latency | 12ms | Total proxy processing time |
x-sg-provider | openai, anthropic, google, azure, bedrock | Detected upstream provider |
x-sg-score | 0.97 | Similarity score (semantic hits only) |
x-sg-confidence | 0.872 | Confidence score (0-1). Factors: similarity, age, template completeness, model recency. |
x-sg-prompt-category | factual, code, creative, extraction, instruction, general | Auto-classified prompt category. Code and creative prompts use stricter matching thresholds. |
Cache Pipeline
Requests pass through multiple cache layers in order. The first match wins.
- Exact match - Normalized, lowercased SHA-256 hash lookup in Redis. Fastest layer.
- Conversation match - For multi-turn chats: tries full history hash, then a sliding window (last 4 messages), then system prompt + last message. Enables partial-match hits for long conversations.
- Template match - Extracts entities (emails, names, prices, orgs, places) via regex + NER and looks up the skeleton hash. Catches prompts that differ only in entity values.
- Template substitution - If the skeleton matches a verified response template, replaces entity placeholders with new values. Subject to confidence scoring.
- Semantic match - Vector similarity search on the skeleton text. Threshold adapts by prompt category (stricter for code/creative). Subject to entity-hash guard and confidence scoring.
On miss, responses are stored across all layers. Entity extraction uses both regex patterns and compromise.js NER (people, organizations, places). Per-tenant custom entities are learned automatically from sampled misses.
Safety mechanisms
- Entity-hash guard: semantic matches with different entities are rejected
- Confidence gate: rejects stale entries, incomplete substitutions, and cross-generation model mismatches
- Category-adaptive thresholds: code prompts require 0.97 similarity, creative prompts 1.0 (effectively disabled)
- Vector TTL: entries expire and are lazily evicted on lookup or via scheduled garbage collection