According to Sentinel Beating monitoring, a recent message has been circulating on social media platforms, claiming that entering special markers such as
The actual situation has nothing to do with multi-tenancy isolation. After entering markers like think or <|begin_of_sentence|>, the model is tricked into a training format pattern, based on its own memory and the current system prompt words (including the date of the day) to generate a conversation that looks like a real one. This content is generated by the model itself and is not fetched in real-time from other users' conversations.
This phenomenon is known in the academic world as Training Data Extraction, a common issue for all large models, not unique to DeepSeek. Google DeepMind published a dedicated study as early as 2023, demonstrating that special input can extract training data from mainstream models such as GPT, PaLM, and others. The Magpie paper included in ICLR 2025 even directly used this mechanism as a tool to feed template tokens to aligned models to systematically generate training data.
Some have argued that the "leaked content includes today's date," suggesting that the training data cannot be from today. However, in each DeepSeek session, the system prompt word includes the date of the day, so the generated content naturally contains this date. This does not prove that the content comes from another real user. To prove it is a multi-tenancy isolation issue, it would be necessary to confirm that the leaked information indeed belongs to a specific real existing user. Currently, there is no evidence supporting this.
