Original Title: "IOSG Weekly Brief | When Your Browser Becomes an Agent #289"
Original Authors: Mario Chow, Figo, @IOSG
Over the past 12 months, the relationship between web browsers and automation has undergone a dramatic change. Almost every major tech company is rushing to build its own browser agent. This trend became more apparent starting at the end of 2024: OpenAI introduced the Agent mode in January, Anthropic added the "Computer Usage" feature to the Claude model, Google DeepMind launched Project Mariner, Opera announced the agent-based Neon browser, and Perplexity AI released the Comet browser. The signal is clear: the future of AI lies in agents that can autonomously navigate the web.
This trend is not just about giving browsers smarter chatbots but represents a fundamental shift in how machines interact with the digital environment. A browser agent is a type of AI system that can "see" web pages and take actions: click on links, fill out forms, scroll through pages, input text - just like a human user. This model promises to unleash significant productivity and economic value as it can automate tasks that still require manual operation or are too complex for traditional scripts to handle.
▲ GIF Demo: Actual operation of an AI browser agent: following instructions, navigating to the target dataset page, automatically taking screenshots, and extracting the required data.
Almost all major tech companies (and some startups) are developing their own browser AI agent solutions. Here are some of the most representative projects:
OpenAI's Agent Mode (formerly known as Operator, launched in January 2025) is an AI agent with a built-in browser. The Operator can handle various repetitive online tasks, such as filling out web forms, ordering groceries, scheduling meetings - all through the familiar standard web interfaces used by humans.
▲ AI Agent Schedules Meetings Like a Professional Assistant: Checks calendar, finds available time slots, creates events, sends confirmations, and generates an .ics file for you.
By the end of 2024, Anthropic introduced a brand-new "Computer Use" feature for Claude 3.5, giving it the ability to operate computers and browsers like a human. Claude can see the screen, move the cursor, click buttons, and type text. This is the first of its kind large-scale model agent tool to enter public beta testing, allowing developers to have Claude navigate websites and applications autonomously. Anthropic positioned this as an experimental feature primarily aimed at automating multi-step workflows on web pages.
AI startup Perplexity (known for its question-answering engine) launched Comet browser in mid-2025 as an AI-driven alternative to Chrome. At the core of Comet is a conversational AI search engine built into the address bar (omnibox), providing instant answers and summaries instead of traditional search results.
Additionally, Comet also features the Comet Assistant, a sidebar-residing agent capable of automatically performing daily tasks across websites. For example, it can summarize emails you open, schedule meetings, manage browser tabs, or browse and extract information from web pages on your behalf.
Through the sidebar interface enabling the agent to perceive the current web page content, Comet aims to seamlessly integrate browsing with AI assistance.
In the previous sections, we have reviewed how major tech companies (OpenAI, Anthropic, Perplexity, etc.) have injected functionality into browser agents through different product forms. To better understand their value, we can further explore how these capabilities are applied in real-life scenarios and enterprise workflows.
# E-commerce and Personal Shopping
A highly practical scenario is to delegate shopping and booking tasks to agents. An agent can automatically fill your online shopping cart and place an order based on a fixed list, or search for the lowest price among multiple retailers and handle the checkout process for you.
For travel, you can have AI perform tasks such as: "Help me book a flight to Tokyo next month (under $800), and then book a hotel with free Wi-Fi." The agent will handle the entire process: search for flights, compare options, fill in passenger information, complete the hotel booking, all done through airline and hotel websites. This level of automation far surpasses existing travel bots: it's not just a recommendation but direct purchasing.
# Boosting Office Efficiency
Agents can automate many repetitive business operations people perform in a web browser. For example, organizing emails and extracting action items, or checking for availability across multiple calendars and automatically scheduling meetings. Perplexity's Comet assistant can already summarize your inbox content through a web interface, or add to your schedule for you. Agents can also, with your authorization, log into SaaS tools to generate routine reports, update spreadsheets, or submit forms. Imagine an HR agent that can automatically log in to various job boards to post positions; or a sales agent that can update CRM system with lead data. These daily mundane tasks that would have otherwise consumed significant employee time can be automated through AI handling web form and page interactions.
In addition to single tasks, agents can also string together complete workflows across multiple network systems. All of these steps require interaction across different web interfaces, which is where browser agents excel. Agents can log into various dashboards for troubleshooting and even orchestrate processes, such as onboarding new employees (creating accounts across multiple SaaS websites). Essentially, any multi-step operation currently requiring navigating through multiple websites can be handed over to agents.
Despite the immense potential, today's browser agents still have a long way to go to reach perfection. The current implementations reveal some long-standing technical and infrastructural challenges:
The modern web is designed for human-operated browsers and has gradually evolved over time to actively resist automation. Data is often buried in HTML/CSS optimized for visual presentation, restricted by interactive gestures (mouse hovering, scrolling), or only accessible through undocumented APIs.
Building on this foundation, anti-crawling and anti-fraud systems have manually added additional barriers. These tools combine IP reputation, browser fingerprints, JavaScript challenge-response, and behavioral analysis (such as the randomness of mouse movements, typing rhythm, dwell time). Paradoxically, the more "perfectly" AI agents perform, the more efficient they are: for example, instant form filling, never making mistakes, the easier they are to identify as malicious automation. This can lead to hard failures: for example, OpenAI or Google's agents may smoothly complete all steps before checkout, but ultimately be intercepted by a CAPTCHA or second security filter.
The human-optimized interface layered with a robot-unfriendly defense forces agents to adopt a fragile "human mimicry" strategy. This approach is highly prone to failure, with a low success rate (if no human intervention, the completion rate of full transactions is still less than one-third).
To give the agent full control, it often needs access to sensitive information: login credentials, Cookies, two-factor authentication tokens, or even payment information. This brings concerns that both users and businesses can understand:
· What if the agent makes a mistake or is deceived by a malicious website?
· Who is responsible if the agent agrees to certain terms of service or executes a transaction?
Based on these risks, current systems generally take a cautious approach:
· Google's Mariner will not enter credit card information or agree to terms of service but will hand it back to the user.
· OpenAI's Operator will prompt the user to take over the login or CAPTCHA challenge.
· Anthropic's Claude-driven agent might directly refuse the login for security reasons.
The result is frequent pauses and handoffs between AI and humans, weakening the seamless automation experience.
Despite these obstacles, progress is rapidly advancing. Companies like OpenAI, Google, Anthropic, and others learn from each iteration. With growing demand, a "co-evolution" is likely to occur: websites will become more agent-friendly in favorable scenarios, and agents will continuously improve their ability to mimic human behavior to bypass existing barriers.
The current browser agent is facing two drastically different realities: on one hand, the hostile environment of Web2, with anti-crawling measures and pervasive security defenses; on the other hand, the open environment of Web3, where automation is often encouraged. This difference determines the direction of various solutions.
The following solutions are broadly divided into two categories: one helps agents bypass the hostile Web2 environment, while the other consists of solutions native to Web3.
Although the challenges faced by browser agents remain significant, new projects are continuously emerging, attempting to directly address these issues. The cryptocurrency and decentralized finance (DeFi) ecosystem are becoming a natural testing ground because it is open, programmable, and less hostile to automation. Open APIs, smart contracts, and on-chain transparency eliminate many friction points common in the Web2 world.
Below are four types of solutions, each addressing one or more core limitations of the present:
These browsers are designed from the ground up for agent-driven automation and deeply integrate with blockchain protocols. Unlike traditional Chrome browsers, which require additional dependencies such as Selenium, Playwright, or wallet extensions for on-chain automation, natively agent-oriented browsers directly provide APIs and a trusted execution path for agent invocation.
In decentralized finance, transaction validity relies on cryptographic signatures rather than whether the user is "human-like." Therefore, in an on-chain environment, agents can bypass common Web2 CAPTCHA challenges, fraud detection scores, and device fingerprint checks. However, if these browsers target Web2 sites like Amazon, they cannot bypass the relevant defense mechanisms and would still trigger standard anti-bot measures in that scenario.
The value of agent-oriented browsers does not lie in magically accessing all websites, but in:
· Native blockchain integration: Built-in wallet and signature support, eliminating the need for pop-ups through MetaMask or parsing dApp front-end DOM.
· Automation-first design: Providing stable high-level instructions that can be directly mapped to protocol operations.
· Security model: Fine-grained permission control and sandboxing to ensure key safety during automation.
· Performance optimization: Capable of executing multiple on-chain calls in parallel without browser rendering or UI delays.
# Case Study: Donut
Donut integrates blockchain data and operations as a first-class citizen. Users (or their agents) can hover to view real-time risk indicators of tokens or directly input natural language commands such as "/swap 100 USDC to SOL." By bypassing Web2's adversarial friction points, Donut enables agents to operate at full speed in DeFi, enhancing liquidity, arbitrage, and market efficiency.
Granting agents sensitive permissions poses a significant risk. Relevant solutions use Trusted Execution Environments (TEEs) or Zero-Knowledge Proofs (ZKPs) to cryptographically verify an agent's intended behavior before execution, allowing users and counterparties to validate agent actions without exposing private keys or credentials.
# Case Study: Phala Network
Phala uses TEEs (such as Intel SGX) to isolate and protect the execution environment, preventing the Phala operator or attackers from eavesdropping on or tampering with agent logic and data. A TEE acts like a hardware-backed "secure enclave," ensuring confidentiality (externally invisible) and integrity (externally unmodifiable).
For a browser agent, this means it can log in, hold session tokens, or process payment information, all without sensitive data ever leaving the secure enclave. Even if the user's machine, operating system, or network is compromised, data leakage is impossible. This directly addresses one of the key barriers to agent application adoption: the trustworthiness of sensitive credentials and operations.
Modern anti-bot detection systems not only check for request speed or automation but also combine IP reputation, browser fingerprinting, JavaScript challenge responses, and behavioral analysis (such as cursor movement, typing rhythm, and session history). Agents originating from datacenter IPs or entirely reproducible browsing environments are easily identified.
To tackle this issue, these networks no longer scrape web pages optimized for humans but instead directly collect and provide machine-readable data or simulate traffic through real human browsing environments. This approach circumvents the vulnerabilities of traditional crawlers in parsing and anti-scraping stages, providing agents with cleaner and more reliable inputs.
By proxying agent traffic through these real-world sessions, distributed networks enable AI agents to access web content like humans without immediately triggering blocks.
# Case Study
· Grass: Decentralized Data/DePIN Network, where users share idle residential broadband to provide a proxy-friendly, geographically diverse access channel for public web data collection and model training.
· WootzApp: An open-source mobile browser that supports cryptocurrency payments, with backend proxies and zero-knowledge identity; it gamifies AI/data tasks for consumers.
· Sixpence: A distributed browser network that routes traffic for AI agents through global contributors' browsing.
However, this is not a complete solution. Behavioral detection (mouse/scrolling trajectory), account-level restrictions (KYC, account age), and fingerprint consistency checks could still trigger blocks. Therefore, the distributed network is best viewed as a foundational obfuscation layer that must be combined with human-mimicking execution strategies to maximize effectiveness.
Currently, an increasing number of technology communities and organizations are exploring: What if future network users are not only humans but also automated agents? How should websites securely and compliantly interact with them?
This has spurred discussions on some emerging standards and mechanisms, aiming to allow websites to explicitly state "I allow trusted agents access" and provide a secure channel for interaction, rather than defaulting to blocking agents as "bot attacks" as is often done today.
· "Agent Allowed" Tag: Similar to the robots.txt followed by search engines, future web pages may include a tag in the code to tell the browser agent "this is a safe space to access." For example, if you use an agent to book a flight, the website would not present a bunch of CAPTCHA challenges but instead offer an authenticated interface directly.
· Authenticated Agent API Gateway: Websites can open dedicated entry points for verified agents, much like a "fast lane." Agents wouldn't need to simulate human clicks or input but would follow a more stable API path to complete orders, payments, or data queries.
· W3C Discussions: The World Wide Web Consortium (W3C) is already researching how to establish standardized channels for "managed automation." This suggests that in the future, we might have a globally accepted set of rules that allow trustworthy agents to be recognized and accepted by websites while maintaining security and accountability.
Although these explorations are still in their early stages, once implemented, they could significantly improve the relationship between humans ↔ agents ↔ websites. Imagine no longer needing agents to desperately mimic human mouse movements to "trick" risk controls but openly completing tasks through an "officially permitted" channel.
In this trajectory, crypto-native infrastructure may take the lead. As blockchain-based applications inherently rely on open APIs and smart contracts, they are automation-friendly. In contrast, traditional Web2 platforms may continue to be cautious, especially companies relying on advertising or anti-fraud systems. However, as users and businesses gradually embrace the efficiency gains brought by automation, these standardization efforts are likely to become a key catalyst driving the entire Internet towards a "proxy-first architecture."
Browser agents are evolving from initial simple conversational tools to autonomous systems capable of completing complex online workflows. This transformation reflects a broader trend: embedding automation directly into the core interface of user interaction with the Internet. While the potential for productivity enhancement is significant, the challenges are equally daunting, including how to overcome deep-rooted anti-bot mechanisms and ensure security, trust, and responsible use.
In the short term, improvements in agent reasoning capabilities, faster speeds, closer integration with existing services, and advancements in distributed networks may gradually increase reliability. In the long run, we may see the gradual implementation of "proxy-friendly" standards in scenarios where automation benefits both service providers and users. However, this transition will not be uniform: adoption will be faster in automation-friendly environments like DeFi, while platforms heavily reliant on user interaction control in Web2 will have slower acceptance.
In the future, competition among tech companies will increasingly focus on: how well their agents can navigate real-world constraints, whether they can be securely integrated into critical workflows, and their ability to consistently deliver results in diverse online environments. Whether all this will ultimately reshape the "browser wars" depends not solely on technical prowess but on establishing trust, aligning incentives, and demonstrating tangible value in everyday use.
Welcome to join the official BlockBeats community:
Telegram Subscription Group: https://t.me/theblockbeats
Telegram Discussion Group: https://t.me/BlockBeats_App
Official Twitter Account: https://twitter.com/BlockBeatsAsia