NewsFlash Articles Data Fundraising Skill&API

Twitter Co-founder Dorsey Endorses mesh-llm: Pooling Idle GPUs into a Decentralized Inference Network

According to 1M AI News monitoring, Twitter and Block co-founder Jack Dorsey endorsed mesh-llm, a tool that turns idle GPUs into a peer-to-peer network to collaboratively run open large models. The project was developed by Michael Neale, Chief Engineer of Block's AI Application Team, as part of the Block open-source AI Agent platform Goose ecosystem, under the MIT license and written in Rust.

The core logic of mesh-llm: run at full speed on a single machine if it fits, automatically distribute if it doesn't. Dense models are pipeline-parallelized by layer slicing, while MoE models (such as Qwen3, GLM, DeepSeek) are expert-sliced, with each node independently inferring, resulting in zero traffic between nodes. Real-world data candidly reveals: A single-machine GLM-4.7-Flash (17GB) achieves 68 tok/s, 2-node WiFi slicing drops to 21 tok/s, 3-node drops to 12-13 tok/s, and cross-city network (about 20ms latency) achieves 10-25 tok/s. The speed degradation is real, but its target users are those who want to run a 142GB Qwen3-235B or 138GB MiniMax M2.5 model but only have a single 24GB GPU— for them, the choice is not between "fast" and "slow," but between "can run" and "cannot run at all."

This approach is technically viable mainly due to the stark difference between the communication patterns in inference and training. Distributed training requires synchronizing all gradients at each step, leading to massive communication overhead, requiring data center-level bandwidth and latency. In inference, nodes only need to transmit activation values between them, resulting in much lower communication volume, where latency only affects the first token time and not the throughput speed of each token. This is also why the concept of "training cutting-edge models using global idle GPUs" has not succeeded so far, while mesh-llm can.

Source

Correction/Report

On-Chain Activity