GPT-5.5 makes routing more important visual summary

GPT-5.5 is more capable than GPT-5.4, but the release also makes one thing obvious: model routing is now part of the product. OpenAI lists higher standard pricing for GPT-5.5 than GPT-5.4, and describes a Fast mode in Codex that generates faster for higher plan usage.

That is normal frontier-model economics. The mistake is treating the newest model as the default for every step.

The routing rule

Use the expensive model where failure is expensive.

Workflow step	Good default	Why
Intake classification	small or mid model	The task is repetitive and easy to validate
Document extraction	mid model plus schema checks	Accuracy matters, but the output shape constrains the work
Ambiguous planning	frontier model	The model has to choose the path
Tool-heavy execution	frontier model with budget	Bad tool calls create real cost
Final review	frontier or separate reviewer model	Independent checks catch drift
Receipt writing	small model or deterministic template	The facts should already be logged

What token efficiency really means

OpenAI says GPT-5.5 can deliver better results with fewer tokens than GPT-5.4 for most Codex users. That can be true and still cost more on the wrong workload.

The question is not only "how many tokens did it use?" The better question is:

how many retries disappeared
how much human correction disappeared
how many tool calls were avoided
how often the first artifact passed review
how often the model stopped early

If GPT-5.5 cuts two failed attempts into one successful run, the higher token price may be cheap. If it writes nicer summaries for a task a smaller model already handles, it is waste.

A sane small-business route

Start with three lanes.

Lane	Model class	Use it for
Cheap lane	fast small model	tagging, spam filtering, simple extraction
Work lane	capable general model	drafting, summarizing, structured responses
Hard lane	GPT-5.5 class	long context, tool use, multi-step ambiguity

The harness decides when to escalate. The client should never need to know the model name unless the model choice affects cost, latency, or trust.

What to measure

For every run, log:

model and effort setting
input and output token count
tool calls
retries
validation failures
human edits
final artifact status

That receipt is how you keep the model upgrade honest.

GPT-5.5 makes routing more important

The routing rule

What token efficiency really means

A sane small-business route

What to measure

Source notes

More notes

GPT-5.5 operator brief

GPT-5.5 and Opus 4.7 are work models

Local inference still has a job