Ant Engineer Reverse Engineers Claude Code Source: Auto Mode Four-Layer Decision Pipeline and Security Classifier Mechanism Revealed

According to 1M AI News monitoring, Ant Group engineer and Umi.js front-end framework creator Chen Cheng reverse-engineered the source code of Claude Code 2.1.81, fully restoring what happens after pressing auto mode. Key finding: every tool invocation must go through a four-layer decision pipeline, where only if the first three layers cannot make a judgment, an independent AI classifier is called for security review.

The four-layer pipeline is as follows: the first layer checks existing permission rules, and if matched, directly allows access; the second layer simulates the acceptEdits mode (which allows file editing permissions), if it passes in this mode, indicating low risk, it skips the classifier; the third layer checks the read-only tool whitelist (Read, Grep, Glob, LSP, WebSearch, etc.), these tools do not modify any state and unconditionally allow access; if none of the above conditions are met, it enters the fourth layer, sending an independent API request to Claude Sonnet for security classification.

Several key design details of the classifier: always using Sonnet instead of Opus, as a trade-off between cost and latency; setting the temperature to 0 for output determinism; the classifier is defined as the "autonomous AI programming Agent's security monitor," protecting against three types of risks (prompt injection, scope creep, accidental damage); the user's CLAUDE.md configuration file is injected into the classifier context as a basis for determining user intent.

The interception rules cover over 22 categories, including force push, direct push to the main branch, downloading and executing external code, production deployment, data leakage, self-modifying permissions, creating remote code execution surfaces, credential leakage, and more. There are 7 exceptions for allowing certain actions: hard-coded key for testing purposes, local file operations within the working directory, read-only GET requests, installation of declared dependencies, installation of official toolchains, reading configuration credentials sent to the target provider, and pushing to the current working branch.

There is also a set of circuit-breaking mechanisms: after being rejected 3 times consecutively or a total of 20 rejections, the system degrades to manual confirmation; in headless mode, the Agent is terminated directly. When the classifier is unavailable, a feature flag controls whether it is "fail-closed" (direct rejection) or "fail-open" (degraded to manual confirmation).

There is also fine-tuned frequency control for behavior prompt injection in auto mode: injecting once every 5 rounds of conversation, with the first injection in each 5-cycle period being a full version (about 800 characters, containing six instructions like "execute immediately, reduce interruption, action over plan"), and the following 4 injections being a concise single-line version, striking a balance between context window usage and behavioral stability.

Source

Correction/Report

On-Chain Activity