Control LLM spend
before it hits your bill

An in-VPC proxy for teams spending $10K+/mo on OpenAI, Anthropic, and Google. Attributes every API call to the right feature, calculates true cost including cached and reasoning tokens, and enforces spend controls before the invoice arrives.

Docker container. One environment variable. No SDK. No code changes.

Talk to a Founder → See how it works

localhost:4100/dashboard

Interactive demo

localhost:4100/dashboard

Explore the dashboard →

<2ms P99 latency overhead Fails open. Your traffic is never blocked Full streaming support Zero telemetry No SDK required

Backed by

🇨🇦

Gov. of Canada CanadaBuys Supplier

☁️

Azure Marketplace Listed

🎓

YEDI Incubator York University

💡

i.d.e.a. Fund Gov. of Ontario

How it works

An in-VPC proxy between your code and LLM providers

SpendProxy intercepts every API call, calculates true cost from the provider response, attributes it to a feature, and stores everything locally. Your data never leaves your infrastructure.

Your App

AI SDK calls

Change one URL

→

↓

Your VPC

SpendProxy

Docker + SQLite

→

↓

Providers

OpenAI · Anthropic · Google

<2ms

P99 latency overhead

Measured across all providers

Fail-open

If SpendProxy is down

Traffic routes directly to providers

SSE

Full streaming support

All 3 providers, accurate token counting

External dependencies

No cloud, no telemetry, no phone-home

SpendProxy is not

× An observability platform. We don't do traces or evals
× A hosted SaaS gateway. Your data never leaves your VPC
× A multi-provider routing layer. Use LiteLLM for that

SpendProxy is

✓ A cost-control proxy that enforces spend policy before the bill
✓ An accuracy layer that calculates true cost including cached and reasoning tokens
✓ An attribution engine that maps every dollar to a feature, automatically

The problem

Your LLM cost dashboard is lying to you

Every AI cost tool gets the numbers wrong. Cached tokens, reasoning tokens, and provider-specific billing create 2-10x discrepancies.

Tool	Bug	Impact
LiteLLM	Cached tokens charged at full rate	10.9x overcharge
LiteLLM	Bedrock costs drastically undercounted	29x undercount
Langfuse	Cache pricing ignored	2.83x overcharge
Cloudflare AI Gateway	Spend completely missed	$15 vs $1,560

LiteLLM 10.9x overcharge

Cached tokens charged at full rate

LiteLLM 29x undercount

Bedrock costs drastically undercounted

Langfuse 2.83x overcharge

Cache pricing ignored

Cloudflare AI GW $15 vs $1,560

Spend completely missed

These aren't edge cases. Cached tokens are 50-90% of production traffic. If your tool gets caching wrong, every number on your dashboard is wrong.

✓

111 tests verify every calculation

Cached token semantics, reasoning token billing, streaming token counting, and provider-specific edge cases. Every model, every provider, every billing scenario.

Accurate cost tracking

LLM costs you can trust

Every provider handles cached tokens differently. SpendProxy gets them all right.

OpenAI

prompt_tokens includes cached tokens. SpendProxy subtracts them before billing. Most tools don't.

Anthropic

input_tokens excludes cached tokens. They're billed separately at a different rate. This is where LiteLLM charges 10.9x too much.

Google

promptTokenCount includes cached tokens. SpendProxy subtracts them. Reasoning tokens (Gemini 2.5) are separated and billed at the correct rate.

Autopilot optimization

5 engines that reduce your bill automatically

Each engine can run in off, monitor, or autopilot mode. Start observing, then flip the switch when you're ready.

Cache Injection

Detects repeated prompts and enables provider-level caching automatically. Cuts costs on repetitive workloads without code changes.

Response Deduplication

Identifies identical in-flight requests and serves a single response to all callers. Eliminates waste from frontend retries and parallel calls.

Model Routing

Downgrades to cheaper models when the task doesn't need the expensive one. Simple lookups go to mini models, complex reasoning stays on full models.

Budget Guardrails

Set spend limits per feature, team, or project. Block, warn, or auto-downgrade when limits are hit. No surprise bills.

Retry Storm Suppression

Detects retry cascades during provider outages and throttles them. Prevents a 2-minute outage from becoming a $5K bill.

Automatic attribution

Know which features cost what, without tagging a single request

Every other tool requires manual tags, SDK decorators, or metadata headers. SpendProxy figures it out automatically.

How it works

01 System prompt fingerprinting: same prompt = same feature, always
02 Toolset fingerprinting: same function tools = same agent
03 Workload classification: chat, agent, embeddings, image gen, audio
04 SDK detection: openai-python, langchain, vercel-ai, crewai, and more

What you see in the dashboard

chat:p-3f8a21b4 $12,400/mo

agent:p-c72e:t-8f4a $8,200/mo

embeddings:search $3,100/mo

chat:p-9d2b77f0 $1,840/mo

Zero code changes. Zero tags. Zero SDK.

Security

You're installing a Docker container in your VPC. Here's why that's safe.

We know what we're asking. Here's what SpendProxy does and doesn't do — verifiable, not just promised.

What SpendProxy does

✓ Forwards API requests to providers with your existing keys
✓ Reads token counts from provider responses to calculate cost
✓ Stores metadata (token counts, model, latency) in local SQLite
✓ Makes one outbound call every 12 hours for license validation

What SpendProxy does NOT do

× Store, log, or read your API keys
× Store prompt or response content
× Send any data to us (beyond the license check)
× Open any inbound ports or accept external connections

Non-root

Container runs as unprivileged user

No inbound

Only outbound to AI providers + license API

Auditable

Monitor with tcpdump or network policies

Fail-open

If SpendProxy is down, traffic goes direct

Setup

Deploy in 5 minutes. Save money by day one.

One Docker container. One environment variable. That's the entire integration.

Step 1

Start SpendProxy

Pull the Docker image and run it. One command.

Step 2

Point your AI SDK at it

Change one URL. Your API keys pass through untouched.

Step 3

Watch costs drop

Open the dashboard. Cost data appears in real time.

 # 1. Start SpendProxy
docker run -d -p 4100:4100 spendproxy/proxy:latest

# 2. Point your existing code at SpendProxy
export OPENAI_BASE_URL=http://localhost:4100/v1

# That's it. No SDK. No code changes.

Pricing

$2,500 pilot. ROI in the first week.

Two-week hands-on pilot with VPC deployment, cost audit, and all 5 optimization engines. Then $1,500/mo ongoing.

See full pricing Talk to a Founder

The pilot

What happens when you start

Week 1

Deploy & audit

We deploy SpendProxy in your VPC, point your traffic through it, and run a full cost audit. You see accurate numbers for the first time, broken down by model, feature, and team.

Week 2

Optimize & measure

We activate optimization engines in monitor mode, identify the biggest savings opportunities, and start cutting waste. You get a report: what you were paying, what you should be paying, and the gap.

After the pilot

Your decision

You keep SpendProxy at $1,500/mo, or you walk away with the audit data and recommendations. No lock-in, no annual contracts.

Talk to a Founder →

Or email hi@spendproxy.com. We reply within 4 hours.

Who's behind this

Built by an engineer, not a sales team

SpendProxy is built by Asaf Zamir, founder of CloudExpat. After managing cloud cost optimization engagements across AWS, Azure, and GCP, he noticed every AI observability tool gets billing math wrong in ways that cost real money. SpendProxy exists because no one else bothered to get the math right.

When you book a call, you talk to Asaf directly. Not a sales rep. Not a demo bot.

5 pilot slots remaining this quarter

30 minutes. We'll show you exactly what your AI features cost.

Talk to a Founder →

Or email hi@spendproxy.com — we reply within 4 hours.

Control LLM spend before it hits your bill

An in-VPC proxy between your code and LLM providers

SpendProxy is not

SpendProxy is

Your LLM cost dashboard is lying to you

LLM costs you can trust

5 engines that reduce your bill automatically

Cache Injection

Response Deduplication

Model Routing

Budget Guardrails

Retry Storm Suppression

Know which features cost what, without tagging a single request

How it works

What you see in the dashboard

You're installing a Docker container in your VPC. Here's why that's safe.

What SpendProxy does

What SpendProxy does NOT do

Deploy in 5 minutes. Save money by day one.

Start SpendProxy

Point your AI SDK at it

Watch costs drop

$2,500 pilot. ROI in the first week.

What happens when you start

Deploy & audit

Optimize & measure

Your decision

Built by an engineer, not a sales team

5 pilot slots remaining this quarter

Control LLM spend
before it hits your bill