Computer-use agents are leaving the demo stage visual summary

The next agent surface is the computer itself. OpenAI is adding computer environments to the Responses API and Agents SDK. Google DeepMind is moving Project Mariner toward the Gemini API. The direction is clear: models are being trained and wrapped to operate tools, files, browsers, and command lines.

That can help a business. It can also create expensive mistakes if the work loop is loose.

What changed

Source	Capability	Practical meaning
OpenAI Responses API	shell tool plus hosted container workspace	Agents can inspect files, run commands, and produce artifacts in an isolated workspace
OpenAI Agents SDK	sandbox execution and file/tool harness	Developers can give agents controlled environments instead of raw machines
Google Project Mariner	browser agents on virtual machines	Agents can research, plan, enter data, and repeat browser workflows

Good first use cases

Computer-use agents are best when the screen work is repetitive and the risk is bounded.

collect public information from a set of pages
compare data across vendor portals
fill a draft form without submitting it
reconcile browser-visible records against a spreadsheet
prepare a report from files in a controlled workspace
test a website workflow and return screenshots

They are a bad first step for payroll, banking, public posting, or anything that submits irreversible changes.

The minimum production wrapper

Control	Why it exists
isolated workspace	the agent should not see the whole machine
allowlisted domains	browser work should stay inside known surfaces
action log	every click, command, and file write needs a receipt
budget cap	long loops can spend money fast
stop condition	the agent needs a clear definition of done
human approval	sensitive actions need explicit consent
replay artifact	screenshots or files prove what happened

What this means for small businesses

The first wave of value will not be "let the agent run the company." It will be small browser tasks that a person hates doing and can easily review.

That is enough. A weekly two-hour admin loop becomes a 10-minute review. A lead researcher turns scattered pages into a source-linked brief. A website QA pass returns screenshots and exact repro steps.

The agent should save attention, not hide the work.

Computer-use agents are leaving the demo stage

What changed

Good first use cases

The minimum production wrapper

What this means for small businesses

Source notes

More notes

Why we built our own agent harness

GPT-5.5 operator brief

MCP needs an operating model