NewsFlash Articles Data Fundraising Skill&API

Coinbase has reduced AI spending by nearly half, attempting to make open-weight models such as GLM 5.2 and Kimi 2.7 the default option.

BlockBeats News, June 27th - Coinbase CEO Brian Armstrong stated in a post that the key to maintaining AI spending stability while achieving exponential growth in token usage is not to set usage friction or spending reminders, but to focus on improving default models, routing, and caching mechanisms. Coinbase is currently experimenting with defaulting to using open-weight models like GLM 5.2 and Kimi 2.7 through the LLM gateway, while still encouraging engineers to choose the appropriate model based on the task. He mentioned that 91% of employees have never reached the usage limit, so the company has not chosen to reduce the limit and add reminders, but has shifted to lower-cost default models.

Regarding model routing, Coinbase preprocesses prompt words in a custom pipeline and routes tasks to the most suitable model based on cache hit rate and model pricing. For example, a cutting-edge model may be needed in the planning phase, but using a cutting-edge model in the execution phase may be excessive. He believes that in the future, models should not be chosen by humans, and AI can autonomously handle this task.

Armstrong also mentioned that cache misses are the easiest way to increase costs. All of Coinbase's requests have cache-awareness to reuse hot caches as much as possible. For example, after properly implementing caching, LibreChat's cache hit rate has increased from 5% to 60%. In addition, Coinbase also requires engineers to maintain concise context, including starting a new session when switching tasks, narrowing the file context scope, and disconnecting unused tools. The goal is not to suppress AI usage but to build infrastructure that can support exponential growth. Through these practices, Coinbase has reduced AI spending by nearly half, while token usage continues to grow.

Source

Correction/Report

On-Chain Activity