NewsFlash Articles Data Fundraising Skill&API

How Codex Uses a Computer? Three Entry Points and Permission Boundaries

Read this article in 21 Minutes

The key is not to give AI more authority, but to ensure it chooses the right boundaries of action

Original Title: Three Ways Codex Can Use a Computer
Original Author: jason
Translation: Peggy, BlockBeats

Editor's Note: This article explores three entry points through which Codex interacts with the external environment: Computer Use, Chrome Extension, and In-App Browser. Although all three appear to address the issue of "enabling Codex to use a computer," they correspond to different task scenarios, permission boundaries, and levels of trust.

Among these, Computer Use has the broadest coverage, allowing direct interaction with authorized native applications, system settings on macOS / Windows, iOS simulation, and even cross-application workflows. It is suitable for processes with a GUI that lack API, plugins, or structured tool support. However, the trade-off is slower speed and the widest permission boundaries. Chrome Extension is suitable for tasks relying on login state, cookies, multiple tabs, and browser identity, such as Gmail, LinkedIn, Salesforce, internal backends, or cross-site logged-in research. The In-App Browser leans more towards development and debugging scenarios, particularly suitable for local services, visual bugs, responsive layouts, and design annotations. It does not inherit the user's normal browser login status, has narrower capabilities, but offers stronger isolation.

The crux of the article is that Codex does not have just one way to "use a computer." What truly matters is selecting the narrowest, most secure, and most structured operational interface based on the task at hand. If a plugin or MCP can be used, visual control should not be the first resort. For tasks involving only web development, the In-App Browser should be prioritized. When requiring user browser identity and login state, then Chrome should be used. Only when structured tools cannot cover the task, and the task must rely on a desktop GUI, should Computer Use be the final mile.

Appshots are not a fourth way to control the computer but a tool to "show Codex the current screen context." It addresses the context input issue, while Browser, Chrome, and Computer Use address the action issue. Taken together, this layered approach actually reveals a key aspect of AI Agent productization: not granting the model unlimited permissions but continually narrowing permissions, defining boundaries in specific tasks, and allowing users to retain oversight of crucial actions.

Below is the original text:

Codex has three ways to use a computer: Computer Use, Chrome Extension, and In-App Browser.

There is some overlap between them, just enough to make it easy to confuse.

By the end of this article, you will know how to install and trigger these three modes, when to use each of them, how Appshots and Developer mode bridge the gap, and what to document in AGENTS.md to guide Codex in selecting the appropriate interface.

The TL;DR is:

That said, whenever possible, it's still preferable to use a plugin or MCP. For example, the Slack plugin allows for a more precise search of a thread than clicking around in Slack; actions generated by the GitHub plugin are also easier to review than having Codex drive a web page. Visual control is best used where the capabilities of structured tools reach their limits.

Everything can be @Computer

Computer Use is the broadest coverage of the three interface modes. It enables Codex to view and interact with graphical interfaces on macOS and Windows, including windows, menus, keyboard input, and the clipboard in apps you authorize.

It is usually the slowest. Structured plugins can call APIs directly; Computer Use, on the other hand, needs to observe the interface, determine where to click, wait for the app to respond, and then check the next state. This visual loop takes time, but it also means Codex can operate on apps that have no available API.

On macOS, slowness doesn't necessarily mean it disrupts you. Computer Use can work in the background with the apps you've authorized while you continue to use other parts of your computer. Many times, I've opened an app while using Codex, only to find that Codex has quietly completed a workflow in the background.

Depending on the apps installed and authorized on your computer, the targets of these operations can include Spotify, Xcode, System Settings, iOS Simulator, or even controlling your iPhone with iPhone Mirroring. It can also switch between multiple apps and handle workflows that span across different applications.

Use it when your task relies on the following:

Native desktop apps, such as Spotify or financial apps;

iOS Simulator, iPhone Mirroring, or other processes that can only be performed through graphical interfaces;

System or app configuration;

Data source without plugins or APIs;

Workflow that requires switching between multiple apps;

Missing final step in a structured integration.

Installation: Open Codex's Settings > Computer Use, then click Install.

Triggering: Mention @Computer, or explicitly request Codex to use Computer Use. As the model's capabilities improve, it will also call itself when needed in the future.

You can try a few examples first:

One of my favorite examples originated from a stolen package. Amazon told me to expect about a 25-minute wait to reach customer service. I handed a Codex thread over to Computer Use, asking it to check the chat window every five minutes initially, then every minute once the customer service representative appeared, and do its best to help me get a refund. By the time I finished my shower, the refund was already processed.

Use @Computer to open Spotify, find my Discover Weekly playlist, and start it. Do not change my account or subscription settings. Use @Computer to open iPhone Mirroring, reproduce the onboarding bug in the iOS app, and take a screenshot of the failing state. Fix the smallest relevant code path, then run the same flow again.

I also use Computer Use as the "last mile" in a structured workflow. In a recent video release, Codex could read feedback from Slack, modify code and render a new video, but at that time, the Slack integration in that thread couldn't upload files. So Computer Use clicked Add file to fill in this missing step.

It also has the broadest trust boundary among the three. Only give it one explicit app or process at a time. Keep it off when certain sensitive apps are not part of the task; scrutinize permission pop-ups carefully; and it's best to have a person present to supervise when dealing with financial, account, payment, credential, privacy, and system security changes.

Handling Multiple Tabs and Login State with @Chrome

The Codex Chrome extension allows Codex to access your already logged-in Chrome state. It should be used when a task depends on your account, cookies, browser profile, or tabs that you have open and authenticated.

This type of operation interface is suitable for working with the following tools:

Gmail or LinkedIn;

Salesforce or a customer support backend;

Internal dashboards;

Research across multiple sites where you are logged in;

Forms that rely on your account or a browser extension.

Installation: Open Codex's Plugins, add Chrome, and follow the setup process. Codex will guide you to install the Codex Chrome extension and grant Chrome permissions. Once the extension shows Connected, start a new thread.

Trigger: Mention @Chrome or explicitly request Codex to use your logged-in Chrome browser:

Use @Chrome to review the open customer account, compare it with the support ticket in the other tab, and draft the missing fields. Stop before submitting.

Chrome tasks run within a tab group, which helps keep all the tabs related to a Codex thread together. Unlike an in-app browser, this operation interface carries your browser identity. This gives it more power but also makes it more sensitive.

Another major advantage is the control over multiple tabs. Chrome can link multiple tabs to the same task, allowing you to reference information from one page in another page and continue the workflow in a third page. While Computer Use can drive the browser through visual means, Chrome interprets the task as a browser workflow rather than a series of screen coordinate operations.

Recently, there was a thread where I handed an open Strudel Composer tab to Codex to make the music more engaging. Chrome provided it with the selected tab, along with the WebMCP tool exposed by the page. Codex analyzed the musical structure, rewrote the harmony and the four-minute overall form, adjusted the tempo, saved the track, and let it keep playing. It didn't need to visually hunt for every control on the interface because Chrome can combine the tab context with the page's structured capabilities.

I also use it to run a long-term Twitter thread. The general instructions are:

Every day, use Chrome to check my DMs, read relevant news, and look for feedback or mentions I should know about. Add anything durable to my vault. Do not post or send messages.

The interesting part is not that Codex can open Twitter, but that this thread can return to the same logged-in working environment long-term, linking discovered content to a local file and leaving a result for me to review.

The trust boundary here is crucial. Websites may regard Codex's clicks, form submissions, and message sends as actions taken by you. The web page content itself is also untrusted input. Clearly distinguish steps with more significant consequences: research, navigation, and drafting can be automated; before sending, posting, purchasing, or submitting, your review is required.

If the entire task is completed in the browser, prioritize using Chrome over Computer Use. Chrome provides the browser's native context required for such tasks without expanding access to the entire desktop.

Handle the website you are developing with @Browser In-App

An in-app browser is a browser that exists within the Codex thread. You and Codex share the same rendering page, making it particularly suitable for building and debugging web applications.

I usually start handling from here:

Local development server;

File-based preview page;

Public pages that do not require login;

Reproducing visual bugs;

Checking responsive layout;

Providing design feedback on page elements.

Its most important constraint is isolation. The in-app browser does not use your regular browser profile, cookies, extensions, login sessions, or existing tabs. This is a limitation when tasks require account identity, but it is a useful boundary when accounts are not necessary.

Setup: Open Codex's Plugins, add the Browser plugin, and enable it.

Trigger: Mention @Browser in the cue, or explicitly request Codex to use the in-app browser:

Use @Browser to open the Vite app on http://localhost:3000/, reproduce the mobile overflow bug, fix it, and verify the same route again at desktop and mobile widths.

This sets up a tight feedback loop: Codex can edit code, manipulate the page, check rendering status, take screenshots, and then revalidate the same process after the fix.

My favorite part is the annotation. When I review a local app, I can directly click on an element, select a region to leave a comment. The style controls also allow me to more accurately preview and provide feedback on text, fonts, spacing, and colors. I usually combine this with voice input and guided procedures: I review the page, leave comments, and continue queuing up more opinions as Codex addresses the current feedback. The page itself becomes a specification document.

This is especially useful for design work. I often ask Codex to consolidate an idea, a research pack, or a project status into a single-file index.html, then open it in the in-app browser. Instead of trying to describe a whole set of designs in another cue, I can directly annotate on the real page: "This hierarchy is reversed", "This shouldn't look so much like a card here", "These controls need more space", or "Use this font size ratio throughout the site". Codex receives comments with relevant screenshots and element context, makes changes to the file, and then reopens the same page for the next round.

Create a single-file index.html for this project brief and open it in the in-app @Browser.

This loop feels more like working on the same canvas with a designer, rather than going back and forth with screenshots and text descriptions.

The in-app browser is also suitable as a starting point for a hybrid workflow. In another thread, I opened an X post using the in-app browser, allowing Codex to investigate the related discussion. The visible page helps it confirm which post I'm referring to; then Codex switches to the Twitter CLI, retrieves 38 replies, including nested replies hidden from the browser view. This is the practice of the "Narrowest Interface" principle: confirming on-screen context with the browser, then conducting deeper searches using structured tools.

There are trade-offs here as well. The isolation of an in-app browser makes it a great development interface, but also means it’s not good for handling Google logins, passkeys, or sites that rely on browser extensions. When identity is critical, switch to Chrome.

Appshots

Appshot is not the fourth way Codex controls computers. It’s a way of pointing Codex at what’s in front of you.

On a Mac, double-tap the CMD key to capture the foremost window. Codex will take a screenshot and attach all available text to the thread. You can Appshot an error, an email, a design, a settings panel, or an unfamiliar form, and then simply say:

That’s the mental model that I find easiest to remember: Appshots are how you point to something on your computer; Browser, Chrome, and Computer Use are how Codex takes action.

Appshots are currently created through the Codex app on macOS. They capture the frontmost window, not the whole desktop. This makes it particularly useful: you can provide focused context without granting control of the app.