Original Title: Codex-maxxing
Original Author: Jason Liu
Translation: Peggy
Editor's Note: The AI Agent is transitioning from being a "coding tool" to a new kind of work operating system.
In this article, author Jason Liu (OpenAI Codex team engineer) takes his own experience using Codex as a clue to record how this transformation is taking place: from pinned threads, voice input, shared memory, to browser control, remote operation, Heartbeats auto-loop, and sidebar panels. Codex is no longer just a chat window waiting for prompts, but is starting to become a space that can carry out tasks, remember context, generate output, and continuously drive work forward.
Most noteworthy is not whether Codex can write better code, but how it is changing "how work is organized." In the past, AI usage often stayed at "question-answer": users make requests, models provide results, and tasks are interrupted when the conversation ends. However, in this new workflow, threads can exist long-term, memory can solidify into files, tasks can automatically execute regularly, users can intervene, review, and correct at any time, ultimately forming a small operational loop.
This means that the Agent's value is shifting from "ability" to "continuity." It is not just helping people complete a single task, but establishing connections between different tools, files, browsers, Slack, Gmail, calendars, and local applications, allowing work to continue progressing even after the user leaves. For knowledge workers, this may be a key step for AI tools to truly enter the daily production process: not replacing an action, but keeping more work from dying after a single prompt.
Below is the original text:
Before Codex came along, I was already heavily using coding Agents. However, most of the time, I was using them within interfaces designed specifically for programming work: generating diffs, modifying code repositories, delivering code.
Starting around November, I began pushing them into knowledge work scenarios. I used Slidev to create slideshows, treating the Agent as a note-taker with voice input, and I have been looking for other outputs that can be assisted by a coding Agent: an index.html, a PDF, a spreadsheet, a set of slides.
The latest upgrade of the Codex App is the first one I've used that truly makes this broader working model feel "native." Codex is still very good at writing code, but the most interesting change is that it has started to provide a place to "put" my work.
What truly changed my usage habits is that I learned to establish a working loop for my tasks: a persistent thread, shared memory, tools that can interact with my computer, the ability to intervene and resume tasks at any time, and an interface that allows me to directly review the output itself.
The first feature that changed my behavior is context compression.
Now, I keep a pinned thread for each important workflow:
My Chief of Staff thread
Agents SDK
OpenAI CLI
Codex for open source
A thread specifically for monitoring Twitter
These are not short conversations. They are giant threads I've been compressing for months. They continuously accumulate history, preferences, and past decisions I don't want to repeat every time I come back.
You can navigate directly to pinned threads using Command-1 through Command-9.
Of course, there are trade-offs. Persistent threads are not free. If you reopen them later, the conversation is likely no longer in the cache, so the cost may be higher compared to starting a new short thread. But for the workflows I truly care about, continuity is worth it.
Voice input allows Codex to capture more of my true thought process.
The benefit is not in speed but in allowing the Agent to access unedited thoughts. Codex has built-in voice input, but I also use Wispr Flow because system-level transcription changes how I provide context to other tools. If I'm planning a task, I might say, "I remember someone named Ben mentioned this in Slack; I don't quite remember what it was, please look into it." When typed out, this sentence may seem vague and annoying, but when spoken, it feels very natural.
Transcribed text is similar. If I want to write an article, I can call someone, record the conversation, or use Granola on my phone to record an offline chat, and then use the transcript as source material. Many plans become better because the model gets my messy but authentic thoughts, not just my polished version.
Voice input becomes more useful when combined with steering.
Steering allows you to continue injecting the next message after a tool call. For example, while reviewing a website, I can continue to speak:
Make this smaller
This copy isn't right
The spacing between these two elements feels off
Open a PR when done
Wait for the preview deployment
Send the preview link to people who need to review on Slack
I don't need to wait for each step to decide what's next. I can continue to append intent while the agent is still working, then leave with a queue of tasks.
Later, Heartbeats can continue to monitor the PR or Slack thread after I leave. The unit of work is no longer "one prompt, one response" but a small operational loop.
Once threads become long-lived, they need a shared memory that doesn't depend on a single code repo.
The key is not just holding onto message history. A long thread can certainly remember a lot, but if that useful information isn't serialized into some persistent place, it gets trapped in the thread. The point of a memory system is to take what a thread has learned and turn it into something I can inspect, edit, diff, and reuse.
Most of my long-lived threads start in an Obsidian vault:
vault/
├── TODO.md
├── people/
├── projects/
├── agent/
└── notes/
At the top level, I keep an AGENTS.md directive that says: when you learn more about a person, advance a project, or close a todo loop, update the corresponding page in the vault.
This vault is where the Agent "resides," independent of any single project. While the codebase holds the code, the vault holds the contextual orbit around my work: characters, decisions, open loops, daily notes, project states, and understandings that would otherwise easily slip between threads.
I also maintain this vault as a GitHub repository. This has two advantages for me:
It's cloud-accessible;
Diff becomes the interface of review memory.
When the Agent updates the vault, I can read the diff to see what it deems important enough to remember. This review step is crucial. I do not want Evergreen Threads quietly accumulating a fuzzy "feel" in the tapestry of conversation history. I want it to write down what actually changed: what this person prefers, what this project is waiting for, what decision has been made, what loop has been closed.
This is also why I like to turn memories into files. Files force the Agent to condense its experiences into a form that can exist beyond the thread. If the thread disappears, condensation is poor, or continued reliance becomes too expensive, useful knowledge still remains.
At this stage, the top threads no longer resemble a chat window; they are more like different workers reading from the same notebook.
Codex also has first-person memories in Settings > Personalization > Memories. I think of it as a local recall layer: it's good for recording stable preferences, repetitive workflows, project conventions, and known pitfalls, but it cannot replace the instructions committed to the repository or replace an explicit vault. Chronicle is particularly interesting here because it can leverage recent on-screen contexts to assist in memory-building. I haven't seriously used it yet, and the documentation is clear that it's a research preview feature that needs to be actively opted into, with real trade-offs in permissions, rate limiting, prompt injection, and unencrypted local memory files. But directionally, it aligns with what I care about: work should leave behind structured memory, not just longer chat logs.
Once a thread has a memory, the next question is: what can it touch?
In my own framing, the most useful distinction is:
$browser: Used for the local web interface I want to inspect and annotate;
@chrome: Used for a logged-in browser state and multiple tabs open;
@computer: Used for tasks that can only be done through a graphical interface.
If I am iterating on a local app, I want to use $browser. If I need to interact within a logged-in browser session, I prefer @chrome. If the only way to complete a task is by clicking into a desktop app, then I need @computer.
On my work computer, Twitter is logged in on Safari. If I have @computer read Twitter there, I cannot use Safari while it works. And when I want an agent to work with multiple authenticated tabs concurrently without taking over the entire app I am using, @chrome is more appropriate.
Connectors then extend this ability into other parts of my actual work. The ones I use most frequently are $slack, $gmail, and $calendar, as much of the work appears in Slack threads, inboxes, and calendars before it becomes code.
Skills make repetitive workflows reusable. The Skill Creator and Skill Installer are good starting points. The Skill Installer allows you to pull in OpenAI-recommended skills directly from Composer. After the release of Codex pets, I used it to install the Hatch Pet skill, but the real value is in this common pattern: once you've successfully done something useful, you can often package it up for Codex to repeat without having to relearn the entire process.
Remote control makes these longer work loops portable.
Codex can keep working on the machine that already has your files, permissions, and local environment, while you can view progress on your phone, review what it's found, answer questions, approve the next steps, or change direction without needing to return to your desk. OpenAI describes it as a way to collaborate with Codex anytime, anywhere.
This is crucial when Codex is on a long-running task, and you want to keep the momentum going. You can start a task and then leave it; when it reaches a decision point, you can guide it via your phone.
This is the same reason why Pinned Threads, Voice Input, and Heartbeats are important: work no longer pauses just because I changed my location. A thread can keep running, and I only need to dedicate just enough attention to help it unlock the next step.
Pinned threads are useful, but they still wait for your instructions. Heartbeats allow them to run periodically.
A Heartbeat is a form of thread-local automation. You can say, "Check this for me every few hours." Then the thread can schedule itself. A thread can have multiple schedules, can continue running until a condition is met, or can adjust its own execution frequency over time.
My Chief of Staff thread runs every 30 minutes:
Check Slack and Gmail every 30 minutes to see if there are any messages that need my attention but have not been replied to yet.
Help me determine which matters are most important.
If someone asks me a question, dive deep into researching the answer as much as possible and draft a reply for me, but do not send it.
When I return to Slack, many replies are often already in draft. I still decide which content to send, but the most labor-intensive context collection is already done.
The same pattern applies to review cycles. Heartbeats can monitor Google Docs comments, pull request comments, or Slack replies and continue driving work forward when feedback comes in.
One of my favorite examples comes from an animation project. I posted a video on Slack and then had Codex check the thread every 15 minutes for feedback; if there were comments, it would re-render a new version, reply in the thread, and tag reviewers. Since the Slack MCP server couldn't upload files, the Agent used @computer to click the "Add file" button and still managed to upload the revised rendering file.
What's interesting is not just that it checks Slack every 15 minutes, but that this cycle spans multiple tool boundaries: Slack for collecting feedback, Remotion for rendering, @computer for uploading. When Heartbeats, connectors, and computer actions come together, they no longer function as individual features but as a feedback loop that can continue running without me sitting there.
Recently, I had a package stolen. Amazon told me I would have to wait around 25 minutes to speak with a live agent. So I set up a thread with @computer and told it:
Check every 5 minutes to see if a customer service representative has joined this conversation.
If they have, do your best to help me secure a refund.
Once they respond, switch to checking every minute so you can respond faster.
By the time I came out of my shower, the refund was already taken care of.
Many of my Heartbeats also update my Obsidian vault, treating it as a form of explicit memory.
I am still learning how to best utilize the latest feature, Goals.
You should set more ambitious goals for it. A weak goal would be: "Execute the plan in this Markdown file." A strong goal would have real success criteria to keep the Agent continuously progressing toward it.
Last week, I tried porting the Python Rich library to Rust. As the original project already had a comprehensive set of unit tests, I could set a goal like this: Port Rich to Rust but ensure it passes all the unit tests from the original Python library.
These tests provide a true measure of progress: the Rust version is only considered done if it passes the same tests as the Python original.
This is different from having a lengthy conversation with an AI, accumulating a Markdown plan, and then simply saying, "Implement it." The upper limit of execution effectiveness depends on the goals and validation methods you provide. Ambition without validation is merely a wish.
The most exciting part of Codex for me is the Sidebar.
People often think of it as a place of things happening in preview. But that understanding underestimates it. The Sidebar is where Codex transforms from just a chat app to the place where work happens.
For me, it serves three purposes: checking artifacts, manipulating web interfaces, reviewing changes. In all these instances, I can view and comment on the same object that the Agent is working on.
Markdown, spreadsheets, CSV, PDF, and slides can all be housed here.
Markdown can be annotated. Spreadsheets can render formulas, support cell editing, and I'll use it to manage the Codex open-source initiative. CSV will display as a table rather than raw text. PDFs can be rendered directly, which is especially useful for LaTeX. Slides can also be created and reviewed without leaving the application.
The key is not just that Codex can generate these artifacts, but that I can inspect and annotate them without breaking the loop.
The in-app browser is more interesting. The Agent can see it, control it with JavaScript using $browser, and I can directly leave annotations on the content I'm viewing.
There are several web interfaces that I now frequently use in this way:
index.html, used for lightweight static artifacts;
Storybook, used for reviewing UI components;
Remotion Studio, used for programmatic animation;
Slidev, used for presentations;
Streamlit, used for data applications.
The minimal version is often the best. You can have a model create a single index.html file with JavaScript and CSS, open it in a side panel, and start interacting immediately. No need for a server. I have been trying to update a static index.html over time with Heartbeats so that each time I return to the thread, there is a fresh artifact waiting for me.
Thariq has a great article discussing why he prefers HTML as an output format over Markdown. I think this intuition is correct. Once the output becomes a small app rather than just a document, the relationship between people and artifacts changes.
If I need something heavier, I can also use a Vite app, but then I have to keep a server running. A pure index.html is much more persistent.
When doing animations, I often have Storybook and Remotion Studio open side by side. I can leave comments like "make this bounce" or "this should be bigger," and the Agent can check the same browser state I'm looking at, including the current frame in the animation.
When giving presentations, I often use Slidev. Codex can review the slides, find truncated content, switch between different pages, and respond to annotations as I review.
I also look forward to this kind of functionality becoming more useful in tools like Streamlit and Jupyter in the future. Different people already live in different applications. Codex is increasingly entering the places they inhabit.
The more Codex has a place that can remember, revisit, review, and take action, the less my work is likely to die between prompts. This is the change I truly care about: not that an Agent can code for me, but that when I leave, more work can still move forward.
Welcome to join the official BlockBeats community:
Telegram Subscription Group: https://t.me/theblockbeats
Telegram Discussion Group: https://t.me/BlockBeats_App
Official Twitter Account: https://twitter.com/BlockBeatsAsia