Five Gates Stand Between This AI and Its Own Changes — type0 | type0

Five Gates Stand Between This AI and Its Own Changes — type0 | type0

Most AI agents are static. You configure them once, deploy them, and live with the results. Phantom, an open-source project from a solo developer working under the name ghostwright, takes a different approach: it rewrites its own configuration after every session, then gates every change through a five-stage validation pipeline before the new config takes effect.

The project — github.com/ghostwright/phantom, Apache 2.0 licensed — describes itself as "an AI co-worker with its own computer." That isn't marketing: Phantom runs on a dedicated VM, maintains three tiers of persistent vector memory via Qdrant, and exposes its capabilities as an MCP server so other agents and tools can connect. It ships with Slack, Telegram, Email, and Webhook channels pre-wired. Discord is not on that list. According to the project's GitHub README, when a user asked about it, the agent said it wasn't available, then built the integration and went live. The self-modifying config is the real story.

The evolution engine

After each session, Phantom runs a six-step pipeline: Observe (extract corrections, preferences, and domain facts from the transcript), Critique (compare performance against the current config), Generate (propose minimal targeted changes), Validate (five gates), Apply (write approved changes and bump the version), and Consolidate (compress observations into principles periodically). The config lives in phantom-config/ as a set of YAML and Markdown files — constitution.md, persona.md, user-profile.md, domain-knowledge.md, and a strategies/ directory covering task patterns, tool preferences, and error recovery. These files are injected into the system prompt. Day 1 they are nearly empty. Day 30 they contain the accumulated learning of every prior session.

Every change creates a versioned snapshot in version.json with a metrics record — total sessions, success rate, correction rate. If metrics degrade after an evolution, the engine auto-rolls back to the previous version.

The five gates

Before any proposed change is applied, it must pass five gates: Constitution (does it violate immutable principles — constitution.md itself cannot be modified by the evolution engine), Regression (does it break golden test cases), Size (is the target config file over 200 lines), Drift (has the config semantically drifted too far from the original), and Safety (does it touch protected patterns).

The Safety and Constitution gates use triple-judge voting with minority veto — three Sonnet 4.6 instances evaluate the change independently, and one dissenting judge blocks it. The Regression gate uses a cascaded approach: Haiku first (cheap), then Sonnet if flagged, then Opus if flagged again. Judge costs are tracked in metrics.json. The proposing model is not the same model — or tier of model — that validates the change. That's the explicit design: avoid self-enhancement bias.

The triple-judge veto is worth scrutinizing. It is software-defined safety, not hardware root-of-trust. A single Sonnet instance that decides a proposed change is unsafe can block it. That is auditable — you can see which gate failed and why — but it depends on the judges being genuinely adversarial to each other and to the proposing agent. Coinbase uses Intel TEE (Trusted Execution Environment) for agent custody, which enforces the safety boundary in silicon rather than inference. Phantom's approach assumes the judges won't collude and won't be fooled. Both represent different points on the capability-safety tradeoff curve: TEE gives stronger guarantees at the cost of hardware dependency; triple-judge voting gives flexibility and auditability at the cost of an inference-layer assumption that is harder to prove formally.

The infrastructure

Phantom runs as a single Bun process on a VM, backed by Qdrant for vector memory (episodic, semantic, and procedural tiers), Ollama for embeddings (nomic-embed-text, 768-dimensional vectors), and SQLite for session state and metrics. Minimum spec is 2 vCPU, 4GB RAM, and 40GB disk; the Phantom container is hard-limited to 2GB. First boot takes 2–3 minutes while Ollama downloads the embedding model. Subsequent restarts are 15–20 seconds.

Ghostwright's other projects include Ghost OS (which the developer describes as giving AI agents eyes and hands), Shadow (persistent memory), and Specter (infrastructure VMs starting at $3.49 per month). Phantom is the fourth in that family, and the self-evolution engine is what distinguishes it architecturally from the others.

The proof of concept that has attracted the most attention is the Hacker News dataset demo — as documented on the project's GitHub page, a Phantom loaded 28.7 million rows (755,000 unique authors, 4.3 million stories) into ClickHouse on its own VM, built an analytics dashboard with interactive charts, created a REST API to query the data, and registered that API as an MCP tool for use in future sessions and by other connected agents. The project also built a monitoring pipeline around Vigil, a lightweight open-source system monitor, ingesting 890,450 rows across 25 metrics. These demos are documented on the project's GitHub page; they have not been independently replicated.

What it is and isn't

Phantom is a side project, not an Anthropic product. It uses the @anthropic-ai/claude-agent-sdk (^0.2.77) with Sonnet 4.6 as the default runtime. The project is free for early adopters; you bring your own API key. It is not a hosted service. The safety properties described above depend on the LLM judges being invoked — they are optional when an API key is present, with heuristic fallbacks when they aren't available.

The self-modifying config is genuinely novel architecture. Whether the five-gate pipeline is sufficient to prevent drift into harmful configurations — or whether it simply creates a more sophisticated failure mode — is a question that will only be answered by watching systems like this run in production over time. For now, it's the most explicit engineering attempt to treat an agent's configuration as a living system rather than a static artifact. The design decisions embedded in that choice are worth understanding regardless of whether you deploy it.