Why we built our own agent harness visual summary

We still use vendor tools. Claude Code, Codex, Cursor, browser agents, and hosted sandboxes are useful. For internal coding loops, most teams should start there.

We built our own harness for a narrower reason: client-facing agents need a control layer we can explain, audit, and change.

The line we draw

Vendor tools are good when the operator is the product user. A developer asks for a refactor. An owner asks a browser agent to research a vendor. A strategist drafts a plan. The person using the tool can see the work and stop it.

Client-facing systems have a different bar. They touch business records, lead data, calendars, drafts, tools, and sometimes money. They need isolation, approval gates, budget controls, and receipts that match the client workflow.

That is where a custom harness earns its place.

Client work changed the requirements

The first version of the harness was not glamorous. It was a thin loop around a model call and a few tools. The useful parts came from what broke during real work.

Requirement	Why it mattered
Per-client sandbox	One client's tools and data should never bleed into another client's run
Tool registry	The agent only sees actions it is allowed to call
Secret boundary	Raw credentials do not belong in prompts or transcripts
Approval gates	Writes, sends, deletes, and purchases need explicit consent
Receipt writer	Every important run needs a record a person can inspect
Budget guard	Long loops should stop before they become expensive
Model route	Cheap steps and high-judgment steps should not use the same path

None of that is a model feature. It is harness work.

Artifact

One run through the harness

Step	What happens	What the receipt keeps
Input	A lead form arrives with service, location, urgency, and source	submission ID, UTM, timestamp
Policy	The harness checks whether the request fits the allowed workflow	policy version and status
Tools	The agent reads allowed records and drafts the next action	tool count and result status
Review	A person approves, edits, or rejects the draft	reviewer state and final decision
Output	The lead record gets an owner, summary, and next step	artifact link and receipt ID

The model matters inside that loop, but the trust comes from the control layer around it.

What the harness owns

The harness does not replace the model. It surrounds the model with operating rules.

Harness component	What it owns	What the client sees
Policy layer	Scope, refusals, approval points	Clear boundaries
Tool registry	Allowed actions and schemas	Predictable behavior
Secret boundary	Credential access through managed systems	No raw keys in prompts
Sandbox	Client-specific workspace and data scope	Reduced cross-client risk
Receipt writer	Model, prompt hash, tokens, tools, latency	Audit trail
Reviewer gate	Human signoff where needed	Quality control

That structure lets us swap models without rebuilding the whole system. It also keeps the client conversation grounded. We can show what the agent is allowed to do, what it did, and where a human still signs off.

What we did not build

We did not build a general-purpose AI platform. We did not build a replacement for coding agents. We did not build a system where the agent roams across every tool a client owns.

The harness exists for bounded workflows: intake, triage, drafting, source gathering, QA, reporting, and handoff. Those are narrow jobs with inspectable outputs.

That constraint is the point. The more serious the business surface, the more boring the agent should feel.

Where vendor tools still win

For repo work, I still reach for dedicated coding agents. For quick browser research, I still use browser-focused tools. For one-off internal exploration, a general chat or agent surface is often enough.

The custom harness is for the work we need to stand behind after the demo ends.

Why we built our own agent harness

The line we draw

Client work changed the requirements

What the harness owns

What we did not build

Where vendor tools still win

More notes

Plan for the model going away

Computer-use agents need production wrappers

Model vs harness vs environment

The line we draw

Client work changed the requirements

What the harness owns

What we did not build

Where vendor tools still win

Related agent notes

More notes

Plan for the model going away

Computer-use agents need production wrappers

Model vs harness vs environment