Abandoning the Data Center, These Startups Are Building the Next AI Models

2025-05-02 11:26

Read this article in 8 Minutes

AI summary

View the summary

Source: Wired
Author: Will Knight

Researchers have successfully trained a new type of Large Language Model (LLM) by leveraging GPU clusters distributed around the world and combining private and public data, a breakthrough that could disrupt the current mainstream AI development paradigm.

Two AI startups taking a non-traditional path, Flower AI and Vana, collaborated to create this new model named Collective-1. Flower AI developed a technology that allows training tasks to be distributed across hundreds of networked computers, a solution that has been used by multiple companies for AI model training without centralized compute power or data. Vana provided diverse data sources including the X platform, Reddit, and Telegram private messages.

By modern standards, Collective-1 is relatively small—its 7 billion parameter count (these values collectively determine the model's capabilities) is far from the scale of today's state-of-the-art models, which easily reach into the hundreds of billions of parameters supporting models like ChatGPT, Claude, and Gemini.

Nic Lane, a computer scientist at the University of Cambridge and co-founder of Flower AI, pointed out that this distributed approach is expected to break through Collective-1's scale limitations. He revealed that Flower AI is currently training a 300 billion parameter model using traditional data and plans to develop a trillion-parameter model later this year—approaching the level of industry leaders. "This could fundamentally change people's perception of AI, and we are pushing hard," Lane stated. This startup will also incorporate images and audio into training to create multimodal models.

Distributed modeling may also reshape the power dynamics of the AI industry.

Current AI companies build models relying on two main pillars: massive training data and massive compute power centralized in data centers—these data centers are connected to advanced GPU clusters through ultra-fast fiber optic networks. They also heavily rely on publicly available web-scraped datasets (though some involve copyrighted material), including web pages and book content.

This model implies that only financially strong companies and countries that can access large quantities of high-end chips can develop the most valuable cutting-edge models. Even open-source models like Meta's Llama and DeepMind's R1 come from companies with large data centers. The distributed approach allows small to medium-sized companies and universities to develop AI ecosystems by integrating dispersed resources, or enables countries lacking traditional infrastructure to build stronger models by interconnecting multiple data centers.

Lane believes the AI industry will increasingly favor new approaches that transcend the limitations of a single data center. "Compared to the data center model, distributed solutions can elegantly scale computing power," he explained.

Security and Emerging Technologies Fellow and AI Governance expert Helen Toner praised Flower AI's approach as having "potentially significant implications for AI competition and governance." She noted, "While it may still lag behind cutting-edge technologies, it holds value as a fast-follower strategy."

Divide and Conquer

The core of distributed AI training lies in rethinking the logic of computing power allocation. Building large language models requires feeding the system vast amounts of text, adjusting parameters to generate meaningful responses. Traditional data centers segment training tasks to different GPUs for execution and periodically consolidate them into a unified master model.

New technologies now allow work that would traditionally require a large data center to be distributed across hardware devices miles apart, relying solely on regular network connections.

Industry giants are also exploring distributed learning. Last year, Google researchers introduced the "Distributed Path Combination" (DiPaCo) framework, enhancing distributed training efficiency. To build models like Collective-1, Lane collaborated with Sino-British scholars to develop a new tool named Photon, which adopts a more efficient data representation and training sharing integration approach. Lane admitted that this process is slower than traditional training, but it offers more flexibility to add hardware acceleration at any time.

Photon was developed by researchers from Beijing University of Posts and Telecommunications and Zhejiang University and was open-sourced last month. Vana, a partner of Flower AI, is committed to enabling users to share personal data with AI builders in new ways—its software supports user contributions of private data from platforms like X and Reddit and can stipulate usage restrictions or even provide financial rewards.

Vana Co-founder Anna Kazlauskas stated that this initiative aims to unlock untapped data potential while giving users more control. "These non-publicly available data, which traditionally could not enter AI models, are for the first time being used for base model training, and users can have ownership in the models created using their data," she emphasized.

UCL computer scientist Mirco Musolesi pointed out that the key value of distributed training lies in unlocking novel data: "Applying it to cutting-edge models allows the AI industry to leverage decentralized sensitive data from sectors like healthcare and finance for training while mitigating risks of data centralization."