When an agent misbehaves, the first useful question is not which model failed. The useful question is which layer failed.

I use three layers: model, harness, and environment. The model reasons. The harness gives it a loop, tools, memory, validation, and stop conditions. The environment is the world the harness can touch: files, APIs, browsers, calendars, CRMs, credentials, and people.

Most agent debugging gets slower because those layers get blamed as one thing.

Three layers, three jobs

Layer	Job	Typical failure
Model	Understand the task and choose the next move	Weak reasoning, bad judgment, missed nuance
Harness	Run the loop and enforce rules	Bad tool schema, missing validation, no checkpoint
Environment	Provide the real-world surface	Broken API, bad data, missing permission, stale page

Think of the model as the brain, the harness as the body, and the environment as the world it lives in. That framing is simple, but it prevents expensive guessing.

If the agent calls a tool with the wrong argument, that is probably a harness problem. If the tool works in one client workspace and fails in another, check the environment. If the same clean task fails across several well-built harnesses, then you may have a model fit problem.

Artifact

Agent failure diagnostic

Symptom	Likely layer	First check	Better fix
Same prompt behaves differently across tools	Harness	Prompt wrapper, tool schema, memory policy	Standardize the run contract
Same workflow fails for one client only	Environment	API fields, auth scope, data shape	Patch the integration or data source
Agent repeats work after interruption	Harness	Loop state and checkpoints	Add state, stop rules, and resume rules
Correct answer creates the wrong business action	Environment	Workflow policy and approval gates	Add a human gate or business rule
Clean task still needs stronger judgment	Model	Compare the same case across routes	Route to a stronger model only for that step

Test the cheapest layer first

The cheapest fix is usually outside the model. Before paying for a stronger route, run a small diagnostic.

Reproduce the failure with the same input.
Check whether the harness gave the model enough context.
Inspect the tool call, arguments, and result.
Confirm the environment returned the data the harness expected.
Only then compare models.

That order matters for small businesses because model upgrades can hide the real issue. A stronger model may recover from a messy tool result once or twice. It will not turn a bad workflow into a reliable system.

Artifact

A cheap failure trace

Observation	Layer checked	Fix
The agent drafted a reply with the wrong service	Harness	Add required service field validation before drafting
The form record had a blank location	Environment	Patch the form and CRM field mapping
The stronger model made the same mistake	Model	Keep the cheaper route and fix the workflow inputs

That kind of trace saves money. The fix turned out to be field validation and data shape. A premium model route would have cost more and fixed nothing.

What businesses usually buy wrong

Most teams buy the brain first. They ask which model is smartest, then hope the rest of the system behaves. That is backwards for operational work.

A service business does not need a model that sounds impressive in a demo. It needs a run loop with clean inputs, scoped tools, visible state, approval points, and a receipt trail. The model matters, but it is only one layer.

This is why Om Concepts talks about agents as systems. The useful work is not the model alone. It is the model inside a harness, touching a controlled environment, with enough evidence for a person to review what happened.

How Om uses this frame

On a client project, this frame changes the first meeting. We do not start with, "Which model should we use?" We start with the workflow.

What input starts the work?
What output counts as accepted?
Which tools or records does the agent need?
Which actions require approval?
What receipt should the run leave behind?

Those questions decide the harness and environment before the model route. Once the system is shaped, model selection becomes a routing decision instead of a belief system.

Model vs harness vs environment

Three layers, three jobs

Test the cheapest layer first

What businesses usually buy wrong

How Om uses this frame

More notes

Plan for the model going away

Why we built our own agent harness

Write the one-page AI policy

Three layers, three jobs

Test the cheapest layer first

What businesses usually buy wrong

How Om uses this frame

Related agent notes

More notes

Plan for the model going away

Why we built our own agent harness

Write the one-page AI policy