According to Dynamic Beating monitoring, DeepSeek has open-sourced TileKernels under the MIT license, releasing a set of GPU low-level computation code aimed at large-scale model training and inference, some of which has been used in internal production environments. GPU kernels are compute programs that run directly on the graphics card, determining the speed limit of model training and inference. All TileKernels are written in Python, relying on the GPU kernel-specific language TileLang to automatically achieve low-level optimization without the need for hand-crafted CUDA C++. DeepSeek claims that most kernels are approaching hardware performance limits.
The library includes two production-level kernels of architectural components not mentioned in the DeepSeek-V3 and R1 papers. Engram is a conditional memory module proposed in a paper by DeepSeek in January of this year, retrieving static knowledge (such as entities and fixed phrases) with O(1) complexity through a hash table, complementing MoE's conditional computation to unload the memory burden on the model backbone; Manifold HyperConnection (mHC) improves upon the HyperConnection proposed by the Byte Seed team in 2024, using dual random matrix constraints to address signal divergence issues during large-scale training. Previously, these two components only existed in papers and demo code, and TileKernels now provide high-performance implementations ready for direct training, indicating that DeepSeek is engineering-ready to integrate these components into the next-generation model.
The library also covers MoE routing and gating, various low-precision quantizations (FP8, FP4, etc.), batch transposition, and other routine stages. The code can be installed via `pip install tile-kernels`, and execution requires an H100/H200 or Blackwell series GPU.
