Skip to content
Home
Services
Work
Resources
AboutContactBook a Strategy Call
Tips & Guides

GPT-5.5 makes routing more important

A practical note on model cost, context, and when a smarter model should replace a cheaper one inside an agent workflow.

  • Tips & Guides
  • advanced
  • Apr 25, 2026
  • 6 min read
  • GPT-5.5
  • Cost
  • Agent Routing
GPT-5.5 makes routing more important visual summary

GPT-5.5 is more capable than GPT-5.4, but the release also makes one thing obvious: model routing is now part of the product. OpenAI lists higher standard pricing for GPT-5.5 than GPT-5.4, and describes a Fast mode in Codex that generates faster for higher plan usage.

That is normal frontier-model economics. The mistake is treating the newest model as the default for every step.

The routing rule

Use the expensive model where failure is expensive.

Workflow stepGood defaultWhy
Intake classificationsmall or mid modelThe task is repetitive and easy to validate
Document extractionmid model plus schema checksAccuracy matters, but the output shape constrains the work
Ambiguous planningfrontier modelThe model has to choose the path
Tool-heavy executionfrontier model with budgetBad tool calls create real cost
Final reviewfrontier or separate reviewer modelIndependent checks catch drift
Receipt writingsmall model or deterministic templateThe facts should already be logged

What token efficiency really means

OpenAI says GPT-5.5 can deliver better results with fewer tokens than GPT-5.4 for most Codex users. That can be true and still cost more on the wrong workload.

The question is not only "how many tokens did it use?" The better question is:

  • how many retries disappeared
  • how much human correction disappeared
  • how many tool calls were avoided
  • how often the first artifact passed review
  • how often the model stopped early

If GPT-5.5 cuts two failed attempts into one successful run, the higher token price may be cheap. If it writes nicer summaries for a task a smaller model already handles, it is waste.

A sane small-business route

Start with three lanes.

LaneModel classUse it for
Cheap lanefast small modeltagging, spam filtering, simple extraction
Work lanecapable general modeldrafting, summarizing, structured responses
Hard laneGPT-5.5 classlong context, tool use, multi-step ambiguity

The harness decides when to escalate. The client should never need to know the model name unless the model choice affects cost, latency, or trust.

What to measure

For every run, log:

  • model and effort setting
  • input and output token count
  • tool calls
  • retries
  • validation failures
  • human edits
  • final artifact status

That receipt is how you keep the model upgrade honest.

Source notes