The audit question your CRO is going to ask in Q3.
There is one question from the risk function that, if you can't answer it on the spot, will keep your agent project in pilot indefinitely. Most builders haven't heard the question yet because they haven't gotten close enough to production for the risk function to engage.
The question isn't "is the model accurate." Accuracy is a metric your team chose, with a benchmark you defined, scored on data you curated. The CRO has heard a thousand accuracy numbers. They don't move her.
The question is this:
"When this agent does something we wouldn't have approved, walk me through what happens between the action and the next time we know about it."
This is the question. It is a request for a chain of custody, a control surface, and a notification path. None of which the model produces. None of which "trust the system prompt" addresses. None of which a tracing tool — even a very good one — answers, because tracing is observability, and the question is about governance.
Why the question doesn't get asked earlier
For the first several months of an agent project, the risk function is not in the room. The work is engineering and product: pick a model, pick a tool set, get the prompts right, hit the eval target. The team learns the model, the model learns the company, everyone celebrates.
Then someone proposes the agent should send the email, not just draft it. Or the agent should book the room, not just suggest it. Or the agent should approve the refund, not just route it. Suddenly there is an external action and an external cost, and the project gets routed to the people whose job is to ask what happens when it goes wrong.
The team brings their slide deck. The CRO listens politely. Then she asks the question. The team has nothing.
Decompose what the question actually wants
The question above is short, but it is doing a lot of work. The risk function wants four specific things, in order:
- Attribution. When an action happens, who delegated authority for it, recursively up to a human?
- Constraint. What did the system refuse to let the agent do? What is the proof of that refusal?
- Containment. When something starts going wrong, what is the smallest, fastest action that stops further damage?
- Evidence. Six months from now, can you reproduce the state of the system at the moment of the action, in a form that an external auditor will accept?
An agent system that has these four is governable. An agent system that has none of them is a research demo with a corporate logo.
What teams usually try first
Three reflexes are common, and all three are insufficient.
Logging the prompts and tool calls. This produces a trace, which is useful for debugging the model. It does not produce attribution — the trace knows the prompt, not the principal. It does not produce constraint — the trace shows what happened, not what was denied. It does not produce containment — at best you can grep for badness after the fact.
Putting the agent behind a feature flag. A feature flag is a binary kill switch for the entire feature. It cannot revoke this specific agent while letting the others run, and it cannot freeze a single budget while leaving capability intact. Risk doesn't want a fuse. Risk wants a circuit breaker.
Capping the agent's API key. Most platforms let you set a monthly cap. Two problems. First, the cap is at the platform layer, not at the action — which means the agent can spend up to its cap on anything, including things you didn't intend it to spend on. Second, the cap is single-dimensional; risk needs per-merchant, per-time-window, per-task constraints simultaneously. A monthly dollar cap is a deadbolt on the front door of a building with no walls.
What an answer looks like
Imagine the CRO asks the question and you respond with this:
Every action this agent attempts is checked against a policy attached to its identity. The identity is part of a chain that ends at a human owner — Sarah, in our case. Sarah delegated the agent through a swarm she controls, and the policy says exactly what the agent can attempt: card authorizations, under $200, at vendors in this MCC list, between 9am and 6pm Eastern, with a daily cap of $500.
If the agent tries something outside that policy, the action is denied at decision time, not after. The denial — and the reason — goes into a hash-chained audit log. Anyone can verify the chain hasn't been tampered with by querying a single endpoint.
If something starts going wrong, we can revoke Sarah's identity. That instantly invalidates every agent under her, and freezes every card they hold. No deploy, no flag flip, just a revocation that propagates in the next decision check.
If you, six months from now, want to know exactly what was happening when an action took place — what the policy said, what the budget had used, who was authorized — you query the audit log by transaction ID. The chain integrity check tells you whether the row has been edited.
That is an answer. It maps cleanly to attribution, constraint, containment, and evidence. The CRO will probably push back on a few specific limits — that's a feature, not a bug. The conversation has moved from can we deploy this to what should the limits be, which is the conversation you wanted in the first place.
The shape of the work
Most teams underestimate how much of this is mechanical. The model isn't the hard part. The hard parts are:
- Designing an identity hierarchy for your agents that mirrors your org chart, and keeping it accurate as both change.
- Picking a policy language that your engineers can write and your auditors can read. (This is why we are moving Ledgerline's policy engine to Cedar — it has formal semantics and an existing audience.)
- Designing the audit log so that when a regulator asks you to prove integrity in three years, you don't have to explain a custom blockchain to them.
- Wiring the kill-switch into every place a value can leave the building — virtual cards, contract signing, message routing.
None of this is glamorous, and none of it is what makes the agent capable. It is the substrate that lets the capable agent ship.
What to do this quarter
If your agent program is approaching production, three things to put on the next sprint plan:
- Write down the answer you would give if your CRO asked the question this afternoon. Not a slide. A paragraph. If you cannot, that's the gap.
- Pick one external action your agent takes and trace, on paper, the chain of authority back to a named person. Note where the chain breaks down. That's the first thing to fix.
- Run a tabletop with your security and risk leads. Stage one: an agent makes a wrong, but authorized, decision. Stage two: an agent makes a decision the operator didn't intend to authorize. Walk through what each team does, in order, with the tools you have today. Time it.
The output of these three exercises is not a project plan. It is the list of things to fix before the CRO asks. If you do them honestly, you will probably end up needing the kind of infrastructure Ledgerline provides — but even if you build it yourself, the exercises are the right place to start.