header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

Tokenomics War: Corporate AI Enters the "Accounting Era"

Read this article in 19 Minutes
AI Cost, ROI, and Internal Resource Allocation
Original Title: Token Budget Wars
Original Author: Jaya Gupta
Translation: Peggy


Editor's Note: Enterprise AI is transitioning from the question of "whether to adopt" to the question of "how to account for it."


Over the past two years, many companies have been driving employee adoption of AI, mostly to keep up with technological trends and competitive pressures. However, as AI inference costs transition from an experimental budget to ongoing operational expenses, CEOs and CFOs are starting to ask a more pragmatic question: How much value is AI really creating? What tangible results are obtained for each dollar of token cost?


This is at the core of the "Token Budget Wars." The so-called token budget wars are not just about enterprises trying to reduce their AI bills; it's about reassessing which business areas are worth investing more computing power into, which tasks should be replaced with cheaper models, which processes can be automated or outsourced, and which are just wasteful consumption.


One key point to note is that the volume of AI usage does not equate to value. In the SaaS era, usage often meant software adoption, but in the AI era, token consumption only indicates that the "meter is running." The same workflow could incur costs that differ by orders of magnitude due to factors like input phrases, context, model selection, and retry attempts. A high bill could signify either that AI is truly working or that the system is idly spinning its wheels.


Therefore, the next stage for enterprise AI is not just about model capabilities but about aligning token costs with business outcomes. The first stage proved that AI can get the job done; the second stage seeks to answer: are these tasks truly worth the cost?


The following is the original text:


Enterprise AI has shifted from "whether to adopt" to "how to allocate."


At the corporate level, the new "currency" is your ability to quantify the return on AI investment. Every functional department is being asked the same questions: What have you delivered? At what cost? Over the past two years, CEOs woke up in the morning watching Jim Cramer on CNBC (#bearish) and witnessing competitors announcing productivity gains, then demanded that AI be used throughout the company. Now, the real pressure comes with the follow-up question: Show me the value.


Claude was launched in November 2025, by which time most enterprises' 2026 annual budgets had already been locked in. By the first quarter, actual usage well exceeded the initial plans. Inference costs are no longer just a budget line item for experiments but have become ongoing operational costs. Along with this shift came a new question: Where is AI truly creating value?


This question is difficult to answer because the utility of a token has not been quantified. A bill cannot tell you whether this expenditure replaced manual work, generated revenue, reduced risk, sped up a process, or was just a group of engineers crazily farming tokens for a leaderboard (#metamates). When the expenditure is only in the hundreds of thousands of dollars, it still looks like an experiment. But once it crosses a certain threshold, such as reaching seven figures, it becomes infrastructure. The technical differences start to have a material impact on the P&L: the cost of tokens for the same workflow and the same set of inputs could vary 5 to 10 times between two runs, with no apparent issues on the surface. At an experimental scale, this fluctuation is already quite costly; but once it reaches infrastructure scale, it becomes a number the CFO must explain to the CEO.


You can call it "marginal token utility": the business value created by spending one more dollar of reasoning cost. This is the truly important number in the scaling phase, a number that most companies cannot currently see.


The question in the boardroom is shifting from "Is AI useful" to "Where does AI really create leverage." Therefore, the so-called token budget dispute is fundamentally a battle for the allocation of tokens.


The rapid escalation of the battle for token ownership is because it is running into a thirty-year-old executive instinct: a large team means a big title, a broad scope of responsibility, and more power. In the past, a visible sign of a senior manager's success was the size of the team they managed—direct reports, skip-level reports, and the headcount in the organizational chart.


But when intelligence becomes a scarce resource, the new sign is: how much intelligence can you command.


AI spend is essentially competing with labor cost.


Most AI budget requests essentially fall into one of three claims: replacing outsourced labor, replacing internal labor, or creating new revenue.


An employee has a salary. A BPO outsourcing contract has a price per work order, claim, invoice, or audit. Humans can understand these units of measurement. But reasoning cost is more complex because the ultimate cost of completing a task depends on how the system runs during execution. A claim task that requires three retries, manual corrections, and calls to cutting-edge models may end up being more expensive than the outsourced labor it was intended to replace. Hence, the discussion is shifting to: what is the cost of achieving an outcome? For example, the cost per resolved work order, per processed claim, per audited contract, per completed invoice, per avoided new hire, per retained customer, or the cost corresponding to every dollar of revenue conversion.


Executives have recognized that BPO is the easiest place to establish benchmarks, as these jobs are already priced on a "per completion" basis. In contrast, comparing internal employees to AI is much more difficult, as employees do many things throughout the day, including browsing TikTok during lunch breaks; productivity gains often manifest as avoiding hiring or unlocking capacity in a dispersed manner; and managers may resist reducing team size based solely on partial automation. BPO provides a quantifiable baseline for business teams.


This is different from the logic of SaaS. SaaS once trained enterprises to view usage as a proxy metric for value.


But AI disrupts this. How much reasoning resource the same workflow consumes can vary greatly due to prompts, retrieved context, selected models, tool invocation, retry attempts, and whether the agent gets stuck. The unit on the bill—token—is stable, but the workload it represents is not.


To be more precise: signal and noise use the same unit of measurement. A rise in token billing may mean real work is being done; but it could also mean computational power is being wasted on poor prompts, irrelevant context, unnecessary tool calls, redundant reasoning, and overkill models. Two companies may have the exact same token bill, but the underlying operations are vastly different: one is transforming reasoning into results, while the other is footing the bill for futile activity, yet these scenarios look identical on the invoice.


SaaS usage tells you: the software has been adopted. AI usage can only tell you: the meter is running. It cannot tell you whether the company is actually up and running.


Why Is Marginal Token Utility Hard to See?


There are three main reasons.


The first is the long tail of retries. If an agent's probability of completing a workflow correctly on the first try is p, then the expected token consumption for each resolved workflow will roughly expand by T/p, where T is the base cost. If the completion rate drops from 90% to 70%, the effective cost of solving each problem will increase by approximately 28%, not 20%, because failures have a compounding effect. In enterprise workflows, inputs are often messy, and edge cases are crucial. Failures not only reduce accuracy but also alter the economic calculus.


The second is context explosion. For operations highly reliant on attention mechanisms, reasoning costs roughly grow with context length in an O(n²) manner. Therefore, if the context length doubles, the reasoning cost will roughly quadruple. Everyone wants the model to have enough information, so systems tend to overdeliver: what used to be five documents is sufficient, but the retrieval pulls in fifty; connectors pour in entire email threads; agents carry on with outdated conversation histories.


Third is routing. When the team doesn’t know which model is “good enough,” the default is to use the most powerful model. A basic classification task might end up running on the same model originally meant for complex inference. When the call volume reaches the millions, deciding whether to hand off simple tasks to a small model versus giving all tasks to a cutting-edge model often becomes the difference between a manageable bill and a board-level issue.


Non-software industries feel this pain in a “transformative” way. Software companies are the first to see this issue because optimized work has already been heavily instrumented. Engineering teams have metrics like PR, commits, deployments, incidents, cycle time, mean time to repair, etc., all of which are tied to the product. While not perfect, this type of work is more easily measurable.


Non-software enterprises feel this issue more acutely because their work is operational. For example, claims processing, underwriting, customer service tickets, compliance reviews, supply chain exceptions, payment disputes. Alternatively, companies with real-world assets face the same issue. These workflows were traditionally measured by human touch, cycle time, SLA attainment, and error rates, often with higher stakes that need to stand up in audits, not just being right on average. The units of work and the units of cost aren’t speaking the same language and don’t live within the same organization. The tech team can see token consumption, the business team can see workflow changes, but bridging the two requires multiple teams to align on “what exactly is being measured.”


I believe software companies experience the token budget battle as a productivity measurement problem, echoing many of the earlier “AI layoffs”; non-software enterprises experience it as a transformative problem.


The missing layer is attribution from token to outcome. Enterprises need a translation layer that connects the reasoning spend to the completed work and resulting business outcomes. This layer must answer three questions: What’s the true cost of this workflow, including retries and fixes? Which parts of the agent’s execution path really matter and which are just wasteful motions? Has this work changed the operating model—e.g., fewer tickets handled per customer service agent, shorter claims processing cycles, smaller BPO budgets, delayed hiring? The next layer is to do outcome attribution in business terms. It’s not just about saying “this workflow cost $2.13,” but rather saying: This type of claim is cheaper for an agent to handle than BPO, but if additional exception files are required by the policy, retry long tail economics are messed up.


Measurement becomes recollection. To connect a token to an outcome, enterprises must capture everything that happened in between: what the agent saw, fetched, which tools were called, what was ignored, where retries happened, when manual overrides occurred, which exception rules were applied, which precedents worked, and why one path succeeded while another failed. The measurement layer must capture decision paths, something that enterprises have almost never truly had in the past. Recording systems can capture what happened, but rarely why. For example, a CRM can tell you a transaction was delayed, but cannot tell you the unrecorded judgments behind the sales forecast.


The rationale behind a decision is one of the most easily corruptible and perishable assets within a company, as it lives in Slack threads, email chains, upgrade meetings, and people's minds. The issue, however, is that people leave, and processes change.


AI has changed this by having an agent generate a trail. Every retrieval, tool invocation, retry, upgrade, manual correction, and final decision becomes part of this path from context to action to outcome. Initially, companies captured these trails to prove the validity of expenses. But once these trails are captured, they become more valuable than the cost report itself because they turn into a persistent record, documenting how the organization actually made decisions. (Ahem, context graph, although I've really grown tired of this term lately.)


The allocation layer is the real prize. If reasoning becomes a pay-as-you-go resource in a customer operation model, then every dollar must prove its worth. Which vendors can explain when a token translated into an outcome, when it did not, and why?


Enterprises will not figure this out entirely on their own. They will purchase it as a transformation. Fortune 500 companies have played out this playbook before: fasten your seatbelt, hire McKinsey, bring in every ex-Palantir employee on the market, and then drive change from the top down by the CEO. The attribution of a token to an outcome will also appear in a similar fashion to ERP, BI, and digital transformations: arriving as an "initiative" with executive endorsement, underlying with a set of infrastructure, and eventually becoming a new source of truth. Founders able to pull this off will assemble a different breed of founding team, who themselves will be different from the traditional entrepreneurial prototype.


Whoever masters the attribution of a token to an outcome will make allocation decisions: which workflows deserve more compute, which should be capped, which should switch to a cheaper model, which should remain human-performed, and which can be outsourced to BPOs. And once you can make these decisions, you control the flow of internal AI spend within the enterprise and gain the trust needed to allocate this resource.


The first phase of enterprise AI has proved: models can get the job done. The next phase will determine: how much those jobs are really worth paying for. As Charlie Munger said: Show me the incentive, and I'll show you the outcome.


[Original Article Link]



Welcome to join the official BlockBeats community:

Telegram Subscription Group: https://t.me/theblockbeats

Telegram Discussion Group: https://t.me/BlockBeats_App

Official Twitter Account: https://twitter.com/BlockBeatsAsia

举报 Correction/Report
Choose Library
Add Library
Cancel
Finish
Add Library
Visible to myself only
Public
Save
Correction/Report
Submit