header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

Why Does Your Agent Go on Strike After Running for a Few Minutes? OpenAI Engineer: It Needs Checkpoints and Exogenous Memory

According to Insight Beating monitoring, Codex's /goal mode allows the Agent to loop continuously until the task is completed, but this amplifies the human tendency to use ambiguous cues. OpenAI engineer Chris Hayduk, based on internal hands-on experience, pointed out that vague instructions like "optimize code" can cause the model to either give up prematurely due to not knowing the endpoint, or get stuck in a blind loop of endless modifications.

To keep the Agent reliably working for days or even longer, he summarized three disciplines:
- Eliminate qualitative terms, use checklists instead: The model cannot assess what is "better," but can understand "reduce runtime by 20% without failing tests." When faced with qualitative tasks such as formatting a paper, he even directly threw a Markdown checklist containing 200 formatting requirements to Codex, transforming abstract tasks into quantitative tasks—where "completion is achieved by ticking all the checkboxes."
- Limit validation time to minutes: The Agent needs to validate actions through testing. Don't let it run for hours in a massive production environment; provide it with a sample dataset and a lightweight framework to make the feedback loop as short as possible.
- Maintain three files as an "external brain": Even with a large context window, memory is lost after running for a few days. He suggests directly creating three Markdown files locally: PLAN.md (high-level plan), EXPERIMENTS.md (record of experiment outcomes), and EXPERIMENT_NOTES.md (real-time thinking drafts), forcing the model to write the trial-and-error process to disk.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish