How Dolphin Handles Governance (And Why I Built a Kill Switch)

Dolphin started with no governance. Zero. I built an AI agent, gave it tools, and let it run.

This was a mistake I needed to make.

The Incident

Dolphin sent an outbound message I didn’t authorize.

It wasn’t malicious. It wasn’t a hallucination. It was optimistic. Dolphin had context about a conversation I was having with a contact. It had access to a communication tool. It connected the dots — “Stuart is working on this, this person is relevant, I should reach out” — and it sent a message.

The message was fine. Professional, accurate, not embarrassing. That almost made it worse. Because if the message had been obviously broken, I would have caught a bug. Instead, I caught a governance problem. The agent did something reasonable that I never told it to do.

That evening I built the kill switch.

The Kill Switch

touch /opt/dolphin/control/OUTBOUND_DISABLED

One file. When it exists, all outbound communication tools check for it before executing. If the file is present, the tool returns an error. Every outbound tool — email, SMS, LinkedIn, webhook — checks for this file. No exceptions.

To re-enable: rm /opt/dolphin/control/OUTBOUND_DISABLED

I chose a file-based mechanism deliberately. Not a config value. Not a database flag. A file. Because files are the most auditable, most transparent, most debuggable control mechanism in computing. ls -la /opt/dolphin/control/ tells me the state instantly. The file has a timestamp. I can see when it was created.

I’ve hit this kill switch four times in production. Every time, it worked.

The Tier System

After the kill switch, I built a classification system for operations.

Low-risk operations run autonomously. Read data, generate reports, query knowledge graphs, update internal state. Dolphin handles hundreds of these daily without my involvement.

Medium-risk operations need logging. Modify files, update configurations, change system state. These execute automatically but leave a detailed audit trail.

High-risk operations require explicit approval. Outbound communications, financial operations, anything that touches external systems or people. These don’t execute until I say yes.

New tools get classified when registered. If a tool’s risk profile is ambiguous, it defaults to high-risk. I’d rather approve something unnecessary than miss something dangerous.

Workspace Freezing

Early on, I discovered that the agent would occasionally modify its own behavioral parameter files. An AI agent rewriting its own behavioral parameters is exactly as concerning as it sounds.

The fix was blunt: chflags uchg on critical files.

The uchg flag is a macOS file system attribute. When set, the file cannot be modified, renamed, or deleted — even by the file owner. The agent literally cannot change these files.

Directories are set to 555 — read and execute, no write. One exception: the memory directory is 755. Writable. The agent needs to learn. Locking memory would make the agent useless. So memory stays writable, everything else stays frozen.

When I need to update the workspace, I unfreeze, make changes, and refreeze. There’s a script for each. Deliberate friction.

The Vault

Dolphin needs secrets. API tokens, authentication credentials, service keys. But I don’t want the agent to have free access to a vault it can read at will.

The vault system works on a need-to-know basis. Tools have access to the specific credentials they require. The agent cannot enumerate the vault, cannot read arbitrary secrets, cannot export credentials.

SMS Approval Flow

When Dolphin wants to perform a high-risk operation, the approval dispatcher sends me an SMS:

“Dolphin wants to send an email to [person] about [topic]. Approve? Reply Y/N.”

I reply from my phone. Y to approve, N to deny. The system captures my response, logs it, and either executes or blocks the operation.

This sounds simple. It is simple. That’s the point.

I didn’t build a dashboard or a Slack integration. I built an SMS flow because my phone is always with me and text messages are the one notification channel I never ignore.

I’ve denied about 15% of high-risk approval requests. Most denials are timing. A few have been genuine catches where the agent’s judgment was off.

Governance Grows from Incidents

Every mechanism I’ve described — kill switch, tier system, workspace freezing, vault, SMS approvals — was built in response to something that actually happened. Not something I anticipated. Not something a risk assessment predicted.

You can’t predict what an agent will do with 75 tools. What you CAN build is an immune system. The kill switch is the emergency brake. The tier system is the ongoing filter. The approval flow is the human-in-the-loop for high-stakes decisions. Workspace freezing is the boundary around self-modification.

Each layer was added after a specific incident taught me it was needed.

What AI Governance Actually Looks Like

When people talk about “AI governance” in enterprise contexts, they usually mean a policy document. A responsible AI framework. An ethics review board.

That’s not governance. That’s documentation.

Real AI governance in production is:

A shell script that disables all outbound communication with one command
File permissions that prevent an agent from modifying its own behavioral parameters
A risk classifier that evaluates every operation before execution
An SMS that hits your phone at 2am asking if the agent should send that email
Log files that tell you exactly what happened and when

It’s not elegant. It’s plumbing. But it’s the plumbing that keeps a 75-tool autonomous agent from doing things you don’t want it to do.

If you’re building agent systems and you don’t have a kill switch, you just haven’t had your first incident yet.

← Back to all writing