Portkey vs SemanticGuard

Routing is one layer. Caching is another.

Portkey is a multi-provider routing and observability gateway. SemanticGuard is an intelligent cache with 100% measured correctness. Here is where each fits, and how to run them together.

DimensionPortkeySemanticGuardBetter fit
Primary jobMulti-provider routing, fallbacks, observabilityIntelligent caching with verified correctnessBoth fit
Semantic cache (paraphrases hit)Exact-match cache; semantic is not the core focusYes, with correctness verified on every served hitSemanticGuard
Correctness measurementN/A (not a caching product)100% measured on public benchmark, methodology disclosed at /benchmarkSemanticGuard
Provider failover and load balancingCore strength: ordered fallbacks, retries, load balancing across providersNot offered today (on the roadmap)Portkey
Prompt library and versioningYes, first-classNot offeredPortkey
Shadow mode (see savings before enabling)N/ADefault. Install and watch "would have saved $X" for a week before flipping cache onSemanticGuard
Self-host in your own cloud tenantAvailable as a self-hosted deploymentOne-click install deploys into your own Vercel account. Prompts and cache stay in your tenantBoth fit
Fail-open designYesYes. If the cache is down, requests pass straight to the providerBoth fit
Pricing model at scalePer-request tiered pricing$49/mo Pro, or 15% of documented savings on Enterprise ($500/mo minimum). Pays for itself when caching worksSemanticGuard

Comparison written 2026-07-01 against publicly documented product scope. Send corrections to hello@semanticguard.dev.

Pick Portkey if

  • Your main pain is provider outages. You need automatic failover across OpenAI, Anthropic, Google, and others in one shot.
  • You want prompt versioning, a prompt registry, and per-prompt A/B tests as first-class primitives.
  • You are load-balancing traffic across multiple provider accounts and keys for rate-limit reasons.
  • You want a unified observability view across all your LLM calls.

Pick SemanticGuard if

  • You have repeated queries (support bot, RAG, docs Q&A, agent tool calls) and duplicated cost is your top concern.
  • You need correctness guarantees on cached responses. Not just "cache and hope".
  • You want to prove the savings before enabling anything (Shadow Mode).
  • You are on Vercel and want a one-click install that deploys the proxy into your own tenant.
  • You need prompts and cache to physically stay in your own cloud account for compliance or trust reasons.

Or stack them

The two products live at different layers of your LLM app. Running them together is the norm, not the exception.

  • Put SemanticGuard in front for the cache layer with verified correctness.
  • Put Portkey in the flow for cross-provider routing, fallbacks, and prompt management.
  • Cache hits skip both provider round-trips; cache misses still get Portkey's routing and observability.
  • Both integrate at the fetch layer, so they compose without either lock-in.
Add SemanticGuard to any OpenAI-compatible client
import { withSemanticGuard } from "@semanticguard/ai-sdk";
import { createOpenAI } from "@ai-sdk/openai";
const openai = createOpenAI({
apiKey: process.env.OPENAI_API_KEY!,
fetch: withSemanticGuard({
gatewayUrl: "https://semanticguard.dev",
apiKey: process.env.SG_API_KEY!,
}),
});
// Cached responses return in under 50ms with verified correctness.
// Cache miss? Passes straight through to the provider. Fail-open by design.

Try SemanticGuard on your real traffic

Free tier includes 10K requests/mo with Shadow Mode. See your potential savings before enabling caching. Nothing changes in your app until you flip it on.