Agentic coding has crossed a line. The best models can now carry larger implementation tasks, reason across repo state, and recover from tool feedback. OpenAI says GPT-5.5 is stronger in Codex on implementation, refactors, debugging, testing, and validation. Anthropic says Opus 4.7 is stronger on advanced software engineering and long-running tasks.
That changes the human role. The work shifts toward scoping, supervising, reviewing, and deciding.
The new engineering loop
| Old loop | Agentic loop |
|---|---|
| write the implementation | define the goal and constraints |
| remember repo context | load the right files and docs |
| run tests manually | require the agent to run checks |
| review after the fact | review the plan, diff, and evidence |
| trust memory | trust logs and artifacts |
What a good task packet includes
- current branch
- files in scope
- files out of scope
- expected behavior
- test command
- acceptance criteria
- rollback path
- style constraints
- deadline or budget
If the task packet is vague, the agent will spend its intelligence guessing.
What to measure
| Metric | Why it matters |
|---|---|
| accepted diffs | output quality |
| test pass rate | validation discipline |
| review findings | missed behavior or regressions |
| rework time | supervision cost |
| token cost | budget reality |
| scope violations | harness quality |
Why this belongs on the Om blog
A lot of future client work will be built with agents in the loop. That does not mean handing the repo to a model and hoping. It means building a process where the model can do real work inside rails and leave evidence.
That is the same pattern we want in client operations: clear task, bounded tools, visible work, accepted artifact.



