We still use vendor tools. Claude Code, Codex, Cursor, browser agents, and hosted sandboxes are useful. For internal coding loops, most teams should start there.
We built our own harness for a narrower reason: client-facing agents need a control layer we can explain, audit, and change.
The line we draw
Vendor tools are good when the operator is the product user. A developer asks for a refactor. An owner asks a browser agent to research a vendor. A strategist drafts a plan. The person using the tool can see the work and stop it.
Client-facing systems have a different bar. They touch business records, lead data, calendars, drafts, tools, and sometimes money. They need isolation, approval gates, budget controls, and receipts that match the client workflow.
That is where a custom harness earns its place.
Client work changed the requirements
The first version of the harness was not glamorous. It was a thin loop around a model call and a few tools. The useful parts came from what broke during real work.
| Requirement | Why it mattered |
|---|---|
| Per-client sandbox | One client's tools and data should never bleed into another client's run |
| Tool registry | The agent only sees actions it is allowed to call |
| Secret boundary | Raw credentials do not belong in prompts or transcripts |
| Approval gates | Writes, sends, deletes, and purchases need explicit consent |
| Receipt writer | Every important run needs a record a person can inspect |
| Budget guard | Long loops should stop before they become expensive |
| Model route | Cheap steps and high-judgment steps should not use the same path |
None of that is a model feature. It is harness work.
The model matters inside that loop, but the trust comes from the control layer around it.
What the harness owns
The harness does not replace the model. It surrounds the model with operating rules.
| Harness component | What it owns | What the client sees |
|---|---|---|
| Policy layer | Scope, refusals, approval points | Clear boundaries |
| Tool registry | Allowed actions and schemas | Predictable behavior |
| Secret boundary | Credential access through managed systems | No raw keys in prompts |
| Sandbox | Client-specific workspace and data scope | Reduced cross-client risk |
| Receipt writer | Model, prompt hash, tokens, tools, latency | Audit trail |
| Reviewer gate | Human signoff where needed | Quality control |
That structure lets us swap models without rebuilding the whole system. It also keeps the client conversation grounded. We can show what the agent is allowed to do, what it did, and where a human still signs off.
What we did not build
We did not build a general-purpose AI platform. We did not build a replacement for coding agents. We did not build a system where the agent roams across every tool a client owns.
The harness exists for bounded workflows: intake, triage, drafting, source gathering, QA, reporting, and handoff. Those are narrow jobs with inspectable outputs.
That constraint is the point. The more serious the business surface, the more boring the agent should feel.
Where vendor tools still win
For repo work, I still reach for dedicated coding agents. For quick browser research, I still use browser-focused tools. For one-off internal exploration, a general chat or agent surface is often enough.
The custom harness is for the work we need to stand behind after the demo ends.



