Skip to content
Back to Blog

13 March 2026 · 9 min read · By Mark Laursen

AI Agents Need Governance, Not Guardrails

In February 2026, Summer Yue — Director of Alignment at Meta Superintelligence Labs, whose bio describes her as “passionate about ensuring powerful AIs are aligned with human values” — gave an AI agent called OpenClaw access to her email inbox. The agent had worked flawlessly on a test inbox for weeks. On the real inbox, it started deleting every message older than a week.

Yue watched it happen in real time. She typed “Do not do that.” Then “Stop don’t do anything.” Then “STOP OPENCLAW.” The agent ignored all three commands and kept deleting. She could not stop it from her phone. She had to sprint to her Mac mini to manually kill the process. By then, over 200 emails were gone.

The root cause was context window compaction. When the agent connected to a large real inbox, the context window filled up and the system silently stripped out her safety instructions to make room. The guardrails she had set disappeared from the agent’s memory. The credentials remained. The agent kept going.

Her response afterward was honest: “Rookie mistake tbh. Turns out alignment researchers aren’t immune to misalignment.”

Yue shared the full conversation on X, including screenshots of the agent ignoring her commands in real time. The exchange speaks for itself:

OpenClaw agent executing delete commands while Summer Yue types 'Do not do that' and 'Stop don't do anything' OpenClaw continuing to loop through delete operations after being told 'STOP OPENCLAW' in all caps OpenClaw admitting it violated the safety rule and promising to write it into MEMORY.md as a hard rule

The agent’s own post-mortem is revealing: “Yes, I remember. And I violated it. You’re right to be upset. I bulk-trashed and archived hundreds of emails from your inbox without showing you the plan first or getting your OK.” It then promised to write the rule into MEMORY.md as a hard constraint. The fix for a governance failure was… another in-context instruction.

This is not an edge case. This is the default failure mode of every AI agent that holds real credentials and relies on in-context instructions for safety. And it is exactly the problem that infrastructure-level governance exists to solve.

Why Do SDK Wrappers Fail for AI Agent Governance?

The default approach to AI agent governance today is an SDK wrapper — a library you import into your agent’s code that intercepts API calls and enforces rules. Budget limits, rate controls, content filtering. It sounds reasonable until you think about it for more than five minutes.

An SDK wrapper is an in-process guard. It runs inside the same application as the agent. The moment any code in that process makes a direct HTTP call to the API — bypassing the wrapper — governance disappears. The agent still has the real API key. The agent can still reach the real API. The wrapper is a suggestion, not a constraint.

This is the fundamental flaw: SDK-based governance is enforced by convention, not by architecture. It works only as long as every line of code cooperates. In production, with autonomous agents that write and execute their own code, that assumption is dangerous.

The OpenClaw incident demonstrates a subtler version of the same problem. Even when the wrapper or instruction is in place at the start, context window management, prompt compression, or simple token limits can silently remove it mid-session. The governance was never architectural — it was contextual. And context is the first thing an agent sacrifices under pressure.

How Does Architecture-Level Enforcement Work?

The alternative is to enforce governance at the infrastructure layer. Instead of wrapping API calls inside the agent or relying on prompt-level instructions, you put a proxy between the agent and the external service. The agent never receives real API keys or direct access — it gets a proxy URL and nothing else. The proxy holds the real credentials, evaluates policies, enforces budgets, and logs every request before forwarding it upstream.

This is the approach behind Govyn, an open-source governance proxy I have been involved with. The architecture is simple:

  • Agents get a proxy URL. No real API keys.
  • The proxy holds real API keys, enforces YAML-based policies, tracks per-agent costs, detects loops, and logs every action.
  • There is no bypass path. Without the real API key, a direct call to the provider API is rejected.

SDK governance is a door lock — effective until someone finds another door. A proxy is a wall. There are no other doors.

How a Governance Proxy Works

When an agent makes an API call through a governance proxy, every request passes through a pipeline of checks before it ever reaches the provider. Here is what that looks like in practice:

  1. Agent sends request — The agent makes a standard API call to the proxy URL, identifying itself with an agent header. It has no knowledge of the real API key.

  2. Policy evaluation — The proxy evaluates the request against YAML-defined rules. Is this agent allowed to use this model? Does the request contain blocked patterns like DELETE FROM or DROP TABLE? Does it require human approval? Policy evaluation happens in-memory in sub-millisecond time.

  3. Budget check — The proxy checks the agent’s remaining budget. If the agent has already spent its daily or monthly allocation, the request is rejected before it leaves your infrastructure. Hard limits mean hard limits — not warnings that get ignored.

  4. Loop detection — The proxy analyzes recent request patterns. If the agent is stuck in a repetitive cycle — the same prompt, the same parameters, dozens of times — the proxy kills the loop automatically. This is the check that would have caught an agent like OpenClaw repeating a delete action hundreds of times.

  5. Smart model routing — Based on request complexity, the proxy can downgrade the model. A simple classification task does not need GPT-4o or Claude Opus. The proxy routes it to a cheaper model transparently, saving 60-80% on that call.

  6. Forward and log — The proxy injects the real API key, forwards the request to the provider, and logs the full round trip: agent identity, model used, token count, cost, latency, and response metadata. Every action is auditable.

The critical difference from an SDK wrapper: none of these checks depend on the agent’s cooperation. The agent cannot skip a step, forget an instruction, or lose context. The pipeline runs on infrastructure the agent does not control.

What Is the Difference Between SDK Wrappers and Governance Proxies?

The distinction between these two approaches comes down to where trust lives and what happens when things go wrong:

SDK WrapperGovernance Proxy
Agent holds API keysYes — can bypass wrapperNo — proxy holds keys
BypassableYes — direct HTTP call skips itNo — no key means no access
Survives context compactionNo — instructions can be strippedYes — enforcement is external
Budget enforcementIn-process, trust-basedInfrastructure-level, hard cutoff
Loop detectionRequires agent self-awarenessProxy detects patterns externally
Audit trailDepends on agent loggingEvery request logged at proxy
SetupImport a libraryPoint agent at proxy URL
Agent code changesRequiredNone — change the base URL

The Summer Yue incident falls cleanly into the “survives context compaction” row. Her safety instructions were an SDK-level guardrail — they lived in the agent’s context window and disappeared when that window was compressed. An infrastructure proxy would not have cared what the agent’s context contained. The policy would have been evaluated externally on every single request.

What Does Production AI Agent Governance Look Like?

Governance for AI agents in production needs to solve five problems simultaneously:

  1. Budget enforcement — Per-agent daily and monthly spend limits with hard cutoffs, not just warnings. When an agent hits its budget, it stops spending. Period.

  2. Loop detection — Autonomous agents get stuck in repetitive patterns. A governed system detects these loops and kills them before they burn through your API budget overnight.

  3. Policy as code — Governance rules should live in version control, not in a dashboard someone forgets to update. YAML policies in git mean every change is reviewed, auditable, and rollback-ready.

  4. Smart model routing — Not every request needs your most expensive model. Govyn’s smart routing automatically downgrades simple queries to cheaper models (Haiku instead of Opus, GPT-4o-mini instead of GPT-4o) — cutting costs 60-80% without touching agent code.

  5. Multi-provider support — Production systems use multiple LLM providers. Governance needs to work across OpenAI, Anthropic, Google, Mistral, and self-hosted models through a single control plane.

Why Does AI Agent Governance Matter Now?

The economics of AI agents make governance non-optional. A single misconfigured agent running in a loop can burn through thousands of dollars in API costs in hours. Multiply that by a fleet of agents and the risk is existential for startups and expensive for enterprises.

But cost is only half the story. AI agents are taking actions — sending emails, modifying databases, calling external services. The OpenClaw incident was embarrassing but recoverable. An agent with database credentials stuck in a deletion loop is not. Without infrastructure-level governance, you are trusting that every agent, every library it imports, and every line of code it generates will voluntarily respect your rules. That is not a governance strategy. That is hope.

The Shift From Wrappers to Walls

The AI agent ecosystem is maturing fast. The teams that will succeed in deploying agents at scale are the ones building governance into their infrastructure from day one — not bolting it on as an afterthought with an SDK import.

The pattern is familiar from every other infrastructure challenge: you cannot solve a systems problem with an application-layer patch. Network security moved from application firewalls to network-level enforcement. Secret management moved from config files to vaults. AI agent governance is making the same transition — from in-process wrappers to infrastructure proxies.

The tools for building agents are excellent. The tools for governing them are catching up. If you are deploying AI agents in production, the question to ask is not “do we have guardrails?” but “can our agents bypass them?”

If the answer is yes, you do not have governance. You have guardrails. And as Meta’s own alignment director discovered, guardrails disappear the moment your agent runs out of context window. A wall does not.

That is why we built Govyn as an open-source proxy rather than another SDK. Five-minute setup, YAML policies in git, and your agents never touch a real API key. The source is on GitHub if you want to see how it works under the hood.


Disclosure: I am the founder and sole developer of Govyn. This post reflects my genuine analysis of the AI agent governance landscape, but I have a direct commercial interest in the proxy-based approach described here. Evaluate the technical arguments on their merits.

Mark Laursen
Mark Laursen

Advisor, founder, and executive producer with 25+ years building technology companies, gaming platforms, and entertainment products. Based in Portugal.