Automation fails in the seams — a one-page canvas for AI agents

In a recent consulting engagement, my client, a large organization, had written a data strategy. Among the strategic goals was to first lift the quality of the most critical data elements (CDE) — a few thousand data fields from a dozen operational systems the business outcomes heavily depend on.

The strategy is sound, and signed off at the top. They had already recently installed and tested/piloted an AI-driven tool: a centralized data catalog with automated profiling of metadata in the source systems, and automated data quality measurements performed daily with immediate flagging of low-quality data. A rather standard feature in modern data catalogs. So far, so good.

Delivery proved harder than the plan. The approach called for implementing the tool in waves, starting with the top-3 highest priority data sources. But here we faced some challenges, that are also rather common:

A) Ownership & access — data ownership and custodian authority slowed getting access rights to the systems
B) Technical debt — low level of systems documentation, with no remediation because
C) Knowledge loss — many system vendors or in-house development teams have since gone and taken precious knowledge with them
D) Stewardship gap — most importantly, uninvolved/reluctant data-stewardship roles

So the work started with the systems that were "more available" rather than the ones the strategy ranked highest — a realistic call, but already a step away from the intent. The data that did reach the catalogue and the quality engine came with its business rules only partly captured and without active stewardship behind it, so the automated checks — AI included — could not produce the results the strategy had set out to achieve. Nothing failed loudly. It failed in the gaps between the strategy, the process, and the controls. It failed in the seams.

I have spent a career watching process automation die in those seams — long before anyone deployed an AI agent, or purchased a subscription to an LLM.

An agent makes the same seams far more dangerous, because an agent doesn't just hold data — it acts. It reads systems, makes a call, and triggers the next step. A risk appetite you declared in a strategy deck, a rule you wrote in a policy, an approval you assumed someone owned — if any of those fails to carry across to what the system actually does, the agent acts anyway, at machine speed, straight through the gap. In my last piece I argued that agentic AI needs an operating model, not just a model. This is the how: the AI operating model on one page — a canvas that closes the seams for one agent, one workflow at a time.

The three layers — and the two seams

An operating model has three layers. The strategic layer decides whether you should do this at all, and who owns the call. The operational layer turns that into how the work runs — approvals, exceptions, the measures that tell you it is behaving. The controls layer is what the system actually enforces. Most teams pour their effort into the layer they can see — the controls — and assume the other two are handled.

But the failure is rarely inside a layer. It is in the two seams between them: where strategic intent hands off to process, and where process hands off to control. A risk appetite gets set and nothing downstream enforces it. A policy gets written and never becomes a behaviour. Close the seams and the automation holds; leave them open and it leaks — exactly as my data programme did, at both seams at once.

The canvas closes the seams

This canvas is the how-to companion to my Two Wings AI Framework — TWAF is the what and the why; this canvas is part of the how, applied to one agent at a time.

You don't govern "AI" in the abstract — you give one agent one job in one workflow, and you fill nine cells that span all three layers on a single page. Fill them and the intent set at the top is forced to survive all the way down to what the system does. If you can't fill a cell honestly, you've found your gap — the empty cell is the finding.

Strategic — what you intend

1. The agent's job — the one workflow and the exact decision the agent owns.
2. Autonomy level — how much it may do alone (the ladder below); start at the lowest useful rung.
3. Accountable human — one named person who answers for the outcome. One owner, not a sign-off shared across a committee.

Operational — how it runs

4. Human checkpoints — where a person approves first, where they are only notified, where the agent is hands-off.
5. Escalation & fallback — what it does when it is unsure or out of bounds: who it goes to, the safe default when no one is available, and the kill switch.
6. Two metrics — one value metric (is it paying off?) and one control metric (is it staying in bounds?). One without the other will lie to you.

Controls — the foundation, set up before any agent runs

A. Guardrails as controls — every policy written as something the system enforces (an allowlist, a permission gate, a validation), held outside the agent's own reach.
B. Data & access — the governed inputs it relies on and the systems it may touch.
C. Drift monitoring — the system proactively watches for model and data drift; the signal feeds the staying-in-bounds control metric (6).

The two seams sit between those three groups. The lettered controls (A, B and C) are standing infrastructure — you put them in place once, before any agent runs; the numbered steps (1–6) are what you design for each new agent. The canvas is where all three layers meet on one workflow, so the seams have nowhere to hide.

What that looks like filled in

Abstract canvases are easy to nod along to and hard to use, so here is a mundane one — an agent that codes supplier invoices to the right cost centre:

The job — read each invoice and assign its cost centre and GL code.
Autonomy — Recommend: it proposes the coding; a clerk approves before anything posts.
Accountable human — the accounts-payable team lead, by name.
Human checkpoints — a clerk approves every invoice above a set amount; below it, approves the day's batch.
Escalation & fallback — unknown vendor or low confidence routes to the clerk; if none is available the invoice waits in a queue rather than be guessed; a kill switch halts all coding.
Two metrics — value: share of invoices coded with no human edit; control: share of actions outside the allowlist (target: zero).
Guardrails — it may only write to the draft-coding field, never post to the ledger; the vendor master is read-only — enforced by permissions it cannot change.
Data & access — the approved vendor master and chart of accounts; no access to payments.
Drift monitoring — watch for a fall in coding accuracy or unfamiliar vendor patterns; the signal feeds the control metric.

Fill those nine lines and the decision to deploy mostly makes itself.

The autonomy ladder

Cell 2 is where teams over-reach. This is a ladder of granted authority — how much you let the agent do — not a measure of how clever the model is; a very capable model on the bottom rung is fine, and common.

Ask — the agent proposes; a human decides and executes.
Recommend — the agent prepares the action; a human approves it before it runs.
Act & review — the agent acts; a human reviews after and can reverse it.
Act within bounds — the agent runs on its own inside a defined allowlist and escalates anything outside it.

What sets the rung is reversibility: the easier a step is to undo, the higher you can safely let the agent climb. Only a small share of an agent's actions are truly irreversible — but those few cascade if you treat controls as an afterthought.[4] Notice the ladder stops on purpose: even the top rung keeps a human watching and a boundary in place. There is no fully hands-off rung. Start every agent on the lowest useful rung; it earns the next with a track record, not a launch-day decision or a vendor's demo.

Closing the seams, in order

Pick one bounded, reversible workflow — not your riskiest, not your biggest; somewhere a mistake is cheap and undoable.
Fill the canvas. The empty cells are your pre-deployment checklist.
Set autonomy to the lowest useful rung and wire each guardrail to a real control — enforced outside the agent — before go-live. That is the process-to-control seam, closed.
Run it behind the human checkpoint and watch both metrics from day one.
Promote it up the ladder only when the evidence earns it.
Reuse the canvas for the next workflow. The second takes a fraction of the time — and that reuse is how the discipline compounds into an advantage.

Do this and the intent set at the top survives all the way to what the system does. That is the whole game.

Why this pays

An agent creates value only inside this structure — and the structure is what you actually own. The model is rented; you and your competitor can rent the same one this week. The seam-closing discipline — the canvas, the controls, the track record — is yours. The economics bear it out: study after study finds the algorithm is a small fraction of what separates the firms that get value from AI; the overwhelming majority is people and process.[1] And most agentic pilots never reach production — when they stall, it is almost always the operating model around the agent, not the model itself.[2][3]

Look at cell B: governed data and access. An agent acting on ungoverned data is just a fast way to automate your worst inputs — the data-quality seam I opened with, now running at machine speed. That is why I run Green Data and Aiconomica as two halves of one value chain, as described in our Two Wings AI Framework: get the data foundation and the operating model right together, and you are on track to turn Agentic AI into profits.

Start with one workflow and one page. The canvas borrows its one-page discipline from the business-model-canvas tradition — the form is old; what is new is putting one agent's authority and controls, across all three layers, on a single sheet where the seams cannot hide. If you'd like a second pair of eyes on your first one, that is the operating-model work I have done across enterprise transformations, now pointed at agents. Let's talk.

References

BCG, Where's the Value in AI? (2024) — roughly 10% algorithms, 20% data and technology, 70% people and process.
Deloitte, Rethinking Operating Models for Humans with Agents (2026).
MIT NANDA, The GenAI Divide: State of AI in Business (2025) — the large majority of organisations reported no measurable bottom-line return on generative AI.
Anthropic, Measuring AI Agent Autonomy (2026).