According to monitoring by Watchful AI, AI researcher Aran Komatsuzaki translated Rich Sutton's well-known paper "The Bitter Lesson" into 9 languages and fed it into the tokenizers of OpenAI, Gemini, Qwen, DeepSeek, Kimi, and Claude models. Taking the token count of the English original text in the OpenAI tokenizer as the baseline 1x, the study observed how many times each language consumed tokens on each model. The results showed that when the same content was queried in Chinese using the Claude model, the token consumption was 1.65 times the baseline; whereas with OpenAI, it was only 1.15 times. Hindi showed an even more dramatic increase on the Claude model, exceeding the baseline by 3 times. Among the 6 models evaluated, Anthropic ranked the lowest.
Translation may alter text length, so the multiplicative factors compared to English are not entirely precise. However, a more convincing observation is the performance of the same Chinese text on different models (still using the same baseline): Kimi only required 0.81 times the tokens (even fewer than English), Qwen needed 0.85 times, but on the Claude model, it jumped to 1.65 times. With identical text, the difference is purely due to the efficiency of the tokenizer. Chinese models handle Chinese more efficiently than English, indicating that the issue lies not with the Chinese language itself, but with whether the tokenizer has been optimized for that language.
For users, more tokens mean increased API costs, longer wait times for model responses, and quicker depletion of context windows. The efficiency of tokenizers depends on the proportion of each language in the training data: with more English data, English words are efficiently compressed; with less non-English data, cutting becomes more fragmented. Aran's conclusion: the larger the market share, the more tokens saved.
