Documentation

Docs

Everything you need to deploy and configure SpendProxy.

Quick start

1. Start SpendProxy

Using Docker:

 docker run -d -p 4100:4100 spendproxy/proxy:latest 

Or using npx:

 npx @cloudexpat/spendproxy 

2. Point your AI SDK at SpendProxy

Change the base URL in your client. Your API keys pass through untouched.

OpenAI
 import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:4100/v1',  // was: https://api.openai.com/v1
}); 
Anthropic
 import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  baseURL: 'http://localhost:4100/anthropic',  // was: https://api.anthropic.com
}); 
Google Gemini
 // Change the URL prefix:
// was:  https://generativelanguage.googleapis.com/v1beta/models/...
// now:  http://localhost:4100/google/v1beta/models/... 

3. Open the dashboard

Navigate to localhost:4100/dashboard. Cost data appears in real time. The dashboard auto-refreshes every 10 seconds.

Configuration

SpendProxy is configured via environment variables.

Variable Default Description
CE_PROXY_PORT 4100 Port to listen on
CE_DATA_DIR ~/.spendproxy Data directory for SQLite database and config
CE_PROXY_DB ~/.spendproxy/ce-proxy.db SQLite database path

Example with custom port and data directory:

 docker run -d \
  -p 8080:8080 \
  -e CE_PROXY_PORT=8080 \
  -e CE_DATA_DIR=/data \
  -v /host/data:/data \
  spendproxy/proxy:latest 

Supported providers and models

SpendProxy uses fuzzy matching for model IDs. Versioned identifiers like gpt-4o-2024-08-06 automatically resolve to the base model.

OpenAI

gpt-4.1, gpt-4.1-mini, gpt-4.1-nano
gpt-4o, gpt-4o-mini
o3, o3-mini, o4-mini
Endpoint: /v1/*

Anthropic

claude-opus-4, claude-sonnet-4
claude-haiku-4
claude-3.5-sonnet, claude-3.5-haiku
Endpoint: /anthropic/*

Google

gemini-2.5-pro
gemini-2.5-flash
gemini-2.0-flash
Endpoint: /google/*

Attribution headers

SpendProxy auto-attributes costs using system prompt fingerprinting, toolset hashing, and SDK detection. You can also use optional headers for explicit control — they take priority over auto-detection.

Request headers (optional)

Header Purpose
X-CE-Route Feature or endpoint name
X-CE-Tag Custom tag (team, environment)
X-CE-Project Project name

Response headers (always included)

Header Value
X-CE-Request-Id Unique request ID
X-CE-Cost Total cost in USD
X-CE-Input-Tokens Input token count
X-CE-Output-Tokens Output token count
X-CE-Cached-Tokens Cached input tokens
X-CE-Model Resolved model name
X-CE-Latency-Ms Proxy overhead latency

For streaming responses, cost data is sent as a final SSE comment: ce-cost {"cost": 0.0023, ...}

Security model

Data residency

SpendProxy runs entirely in your infrastructure. The SQLite database, request logs, cost data, and attribution data all stay in your VPC. Nothing is transmitted to SpendProxy, CloudExpat, or any third-party service.

API key handling

API keys are passed through to the provider in the original request headers. SpendProxy never reads, logs, or stores your API keys. They're forwarded as-is.

Network access

SpendProxy makes outbound HTTPS requests only to the AI provider APIs (api.openai.com, api.anthropic.com, generativelanguage.googleapis.com). It does not phone home, send telemetry, check for updates, or communicate with any other external service.

What SpendProxy stores locally

  • Request metadata: model, tokens, cost, latency, timestamps
  • Attribution data: system prompt fingerprints (hashes, not content), toolset hashes, SDK identifiers
  • Optimization state: cache entries, dedup signatures, budget counters
  • × Prompt content, response content, or API keys are never stored