Claude cheat sheet.
Model IDs, thinking budgets, caching, tools, batch — the working reference for building on Claude.
This is the sheet I keep open when wiring Claude into a production stack. Model IDs are current for the 4.x family (Opus 4.7, Sonnet 4.6, Haiku 4.5). Everything here compiles against the live Anthropic Messages API.
Model IDs (4.x family)
4 entriesCurrent Claude model IDs for the Anthropic API. Default to Sonnet unless the task truly needs Opus or Haiku speed.
claude-opus-4-7OpusFrontier reasoning — hard refactors, architecture, math-heavy proofs.
claude-sonnet-4-6SonnetBalanced default — production coding, long docs, tool use.
claude-haiku-4-5-20251001HaikuFast + cheap — classifiers, extractors, high-QPS routers.
claude-opus-4-7[1m]1M ctxOpus with the 1M context window — whole-repo reasoning.
Extended thinking
4 entriesGive Claude private thinking budget before it answers. Great for planning, tricky bugs, and architecture calls.
thinking: { type: "enabled", budget_tokens: 8000 }Enable extended thinking on the messages API.
budget_tokens: 32000Deep think — architecture, math, long chain-of-thought.
budget_tokens: 4000Light think — enough to plan a small refactor.
stream: trueStream thinking + response together; thinking blocks arrive first.
Prompt caching
4 entries5-minute TTL. Cache expensive system prompts, tool schemas, and large context blocks; save 90% on repeat calls.
cache_control: { type: "ephemeral" }Mark the end of a cacheable prefix block.
Up to 4 breakpointsYou can cache the system prompt, tools, and up to two message-level prefixes.
1024-token minimumCache reads require ≥1024 cached tokens (2048 for Opus).
cache_creation_input_tokensFirst call: paid at 1.25×. Subsequent hits at 0.1×.
Tool use
4 entriesFunction calling. Claude picks a tool, the client runs it, you feed the result back.
tools: [...]Array of tool definitions with JSON Schema input_schema.
tool_choice: { type: "auto" }Let Claude decide. Also: any, tool, none.
tool_use / tool_resultTwo content-block types that pair per call.
disable_parallel_tool_useForce serial tool calls when order matters.
Computer use
4 entriesThe screen-control primitive. Claude sees a screenshot and emits mouse/keyboard actions.
anthropic-beta: computer-use-2025-*Beta header required on the messages API.
type: "computer_20250124"Latest computer-use tool schema.
screenshot / click / type / keyAction types Claude emits back for you to execute.
Sandbox itSafetyNever point computer use at your real desktop — VM / container only.
Batch API
4 entriesAsync batch endpoint — 50% off list price, up to 24h SLA. Perfect for evals + backfills.
POST /v1/messages/batchesSubmit up to 10,000 requests per batch, 256MB total.
50% discountHalf the per-token cost of the interactive API.
24h SLAMost batches complete in minutes, but plan for a day.
GET .../resultsJSONL results, one line per custom_id.
SDKs
4 entriespip install anthropicPython SDK — sync + async clients, streaming, tools, files.
npm i @anthropic-ai/sdkTypeScript SDK — the same API surface, first-class types.
@anthropic-ai/claude-agent-sdkBuild custom agents on managed infrastructure (Managed Agents).
ANTHROPIC_API_KEYEnv var both SDKs read by default.
Cost ceilings
4 entriesDon't get surprised. Set spend guards at the org, API key, and app layers.
Console → Usage → LimitsSet org-level monthly spend cap.
Per-key limitsBound each API key by RPM / TPM / daily spend.
max_tokens ceilingHard-cap output tokens per call in code.
OpenRouter fallbackFleetRoute around 5xx with a paid model first — see AIOS gotchas.
Living document. If the API changes and this drifts, that's a bug — ping Mat and it gets a same-day patch.