According to Dynamic Insight monitoring, the concept creator of "vibe coding" and founding member of OpenAI, Andrej Karpathy, today published a post in support of the Claude Code team's advocacy for the "use of HTML instead of Markdown" approach. Not only does he strongly endorse this change, but he also outlines a roadmap for the evolution of AI interactive interfaces, predicting that the ultimate form of output from large models after multiple rounds of morphological iteration will be "interactive neural videos."
Karpathy believes that the evolution of AI output formats has progressed from the earliest highly unreadable plain text, to today's Markdown, and is now gradually becoming the new standard, with highly flexible formatting, HTML. The future will also go through several intermediate forms (4, 5, 6, etc.), ultimately reaching the endgame (n): interactive neural videos generated directly by diffusion models. As for the specific appearance of this form, he specifically mentioned the recent release of the no-code pixel rendering prototype Flipbook by a former OpenAI researcher.
The underlying logic of this evolutionary trend lies in the physical bandwidth of the human brain. Karpathy points out that about one-third of the human brain is dedicated to processing visual signals, serving as the "ten-lane highway" for inputting information into the human brain. This determines the optimal solution for human-machine fusion interaction: the most efficient way for humans to communicate instructions (Input) to AI is through highly communicative voice, while the best way for AI to provide feedback on results (Output) to humans is through high-bandwidth visual images (images, animations, or videos).
Furthermore, he notes that there are still significant pain points in the current input end, as relying solely on voice or text is not sufficient, and there is an urgent need to enhance spatial indication abilities, such as when two people are sitting side by side looking at a computer and "pointing to a specific area of the screen." As a shortcut to enhance the current experience, he strongly recommends that users directly add "structure the response into HTML" at the end of their prompts.
