header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

Labeling an Agent as a “Product Manager” will not make it more professional; it will only make it reject out-of-bounds requests.

According to Dynamix Beating monitoring, frameworks such as CrewAI and MetaGPT have promoted a multi-agent design: allowing different agents to play the roles of product managers, architects, and test engineers, passing documents and running pipelines like company departments. SagaSu published a lengthy analysis, naming this pattern "Illusion of Three Regimes and Six Departments," after reviewing engineering documents from Anthropic, OpenAI, and Google, it was found that none of these companies adopt such a model.

The article points out two fundamental issues. Firstly, the concept of false boundaries: the need for human division of labor is because one person cannot do everything, but Large Language Models (LLMs) can both write requirement documents and code, eliminating "professional barriers." Agents with assigned roles do not become more specialized as a result; instead, when faced with issues outside their roles, they tend to skip over them. Yet, the most valuable reasoning often occurs at the boundaries. Secondly, information loss during circulation. When Agent A produces a document and passes it to B, they transmit conclusions rather than the reasoning process. As a result, B must reconstruct the context, leading to the gradual loss of implicit assumptions. The longer the chain, the easier it is for "each node to be correct, but the overall outcome is skewed."

Some argue: Isn't the use of progress.txt and spec files by the three companies also a form of document sharing? The article argues that the difference lies in the one-way transfer of documents between roles. Once A completes writing and hands it over to B, they are no longer involved, compressing the information into conclusions. In contrast, status files serve as incremental logs for the same task, where the same role writes and reads at different time points, allowing information to accumulate continuously, thus maintaining coherent reasoning across sessions.

Approaches taken by the three companies:

- Anthropic likens each new session to a "shift engineer," using progress.txt as a handover record. The first session is managed by a dedicated Initializer Agent to set up the environment and write the operation manual, while subsequent sessions read and continue the work. The multi-agent system adopts an orchestrator-worker model, where a primary agent breaks down tasks, and multiple sub-agents explore different directions in parallel. The results are then aggregated and fed back, rather than a relay pipeline.
- OpenAI locks the target in a spec file at the start of a task (preventing agents from "creating impressive but misguided outputs"), with the runbook serving as both an operation manual and an audit log. They also introduce Skills (reusable versioned instruction sets, essentially tools and operating procedures, not roles). GPT-5.3-Codex utilized this mechanism to run continuously for about 25 hours, completing a full design tool while maintaining coherence throughout.
- Google employs a 1M token-length context window extension, integrating project intent into a persistent Markdown file within the codebase to avoid relying on chat records. Gemini 3 further introduces Thought Signatures, preserving key nodes of the reasoning chain in lengthy conversations to prevent logical contradictions.

Several common principles can be distilled from the practices of these three companies. The value of multi-agent systems lies in parallel coverage of the search space, not in simulating division of labor. Anthropic Research's data indicates that token usage explains 80% of performance differences: deploying additional agents yields better results, essentially expending more computing power to explore different directions simultaneously. If a verification step is needed, the verification agent should specialize in identifying issues rather than simply taking over the task. The tools provided to an agent determine its capabilities; role labels only indicate its willingness to perform certain tasks.

Finally, the article warns that model capabilities evolve rapidly; a patch integrated into the system today may become dead code six months later. Anthropic has learned from this experience: Sonnet 4.5 used to prematurely terminate when nearing the context limit, prompting the team to introduce a context reset mechanism. However, upon switching to Opus 4.5, this behavior disappeared, rendering the reset mechanism useless. Therefore, maintaining an evolvable architecture is more crucial than striving for a "perfect architecture."

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish