According to Insight Beating monitoring, Codex's /goal mode allows the Agent to loop continuously until the task is completed, but this amplifies the human tendency to use ambiguous cues. OpenAI engineer Chris Hayduk, based on internal hands-on experience, pointed out that vague instructions like "optimize code" can cause the model to either give up prematurely due to not knowing the endpoint, or get stuck in a blind loop of endless modifications.
To keep the Agent reliably working for days or even longer, he summarized three disciplines:
- Eliminate qualitative terms, use checklists instead: The model cannot assess what is "better," but can understand "reduce runtime by 20% without failing tests." When faced with qualitative tasks such as formatting a paper, he even directly threw a Markdown checklist containing 200 formatting requirements to Codex, transforming abstract tasks into quantitative tasks—where "completion is achieved by ticking all the checkboxes."
- Limit validation time to minutes: The Agent needs to validate actions through testing. Don't let it run for hours in a massive production environment; provide it with a sample dataset and a lightweight framework to make the feedback loop as short as possible.
- Maintain three files as an "external brain": Even with a large context window, memory is lost after running for a few days. He suggests directly creating three Markdown files locally: PLAN.md (high-level plan), EXPERIMENTS.md (record of experiment outcomes), and EXPERIMENT_NOTES.md (real-time thinking drafts), forcing the model to write the trial-and-error process to disk.
