Pricing
Pay for what you save.
Start with Shadow Mode. It logs every request and shows what you would save before any cached response is served. No credit card.
Choose a tier
Free
$0
10K requests/mo
- Shadow Mode shows potential savings
- Identical-match cache
- Cost analytics dashboard
- Request tracing and logging
Popular
Pro
$49/mo
50K included, then $0.50/1K
- Full multi-layer caching
- Advanced pattern matching
- Advanced analytics + projections
- Up to 500K requests/mo
Enterprise
15%
of documented savings
- $500/mo minimum commitment
- Unlimited requests
- We win when you save
- AWS/GCP marketplace billing
Billed monthly on the dollars cached, with a full audit log on every invoice. If the cache doesn't deliver, you pay only the $500 floor.
Talk to salesHow we compare to native provider caching
OpenAI and Anthropic both ship caching primitives. They're free and well-built within their own ecosystems. The trade-offs below are about what each cache covers, not whether one is cheaper than the other.
| SemanticGuard | OpenAI prompt cache | Anthropic prompt caching | |
|---|---|---|---|
| Pricing model | Free tier + $49/mo Pro + 15% of savings Enterprise | Free, applied automatically on exact-match cache hits | 90% discount on cached input tokens (5-min TTL) |
| What gets cached | Same wording, different wording, and multi-turn context, caught by multi-layer verification | Exact prompt prefix only | Explicitly-marked cache blocks only |
| Cross-provider | Yes. Cache a GPT-4o response, serve it to a Claude-Opus paraphrase | No (OpenAI only) | No (Anthropic only) |
| Cache correctness audit | 100% on our public benchmark, AI-judged | No public correctness number | No public correctness number |
| Observability | Dashboard + Prometheus + per-request tracing | Token counters in API response | Token counters in API response |
| Per-tenant kill switch + audit log | Yes | N/A (provider-side) | N/A (provider-side) |
Pricing FAQ
- What counts as a request on the free tier?
- Every proxy call to /api/proxy/*, regardless of cache hit or miss. Shadow-mode requests count too. That's why the free tier is sized for 10K/month rather than 1K.
- What happens at 50K requests on Pro?
- Overage kicks in at $0.50 per additional 1K requests. Hard cap at 500K/month; above that we route you to Enterprise. You get a usage alert at 80% of any threshold.
- How is 'documented savings' calculated for Enterprise?
- Sum of upstream costs we avoided based on cache hits times the published per-million-token price of the model that would otherwise have served the request. The 15% comes out monthly; full audit log and raw numbers are exported with the invoice.
- Can I self-host?
- Yes, with a license key (Pro or above). The Terraform module deploys the gateway into your own Vercel, AWS, or GCP account, so your data stays in your VPC. Contact enterprise@semanticguard.dev.
- Annual discount?
- 10% off Pro paid annually. Enterprise contracts are custom.
Not sure which tier fits?
Run Shadow Mode for a week. The dashboard shows exactly what you'd save by enabling each tier. No credit card, no commitment.
Start free