According to Data Beating monitoring, Anthropic has announced an adjustment to the security strategy of its new model, Claude Fable 5, removing the mechanism that silently degrades performance. The silent degradation mechanism has been criticized by the community as "covert sabotage," leading to a strong backlash from the artificial intelligence research community.
Per Anthropic's terms of service, users are prohibited from using Claude to train competitive models. Anthropic plans to directly reduce the performance of Claude Fable 5 for accounts suspected of training competitive models without notifying the users. AI researchers have warned that silent performance degradation will disrupt the work of third-party security assessment firms and hinder collaboration in the AI security field within the open-source community.
In response to community concerns, Anthropic has issued a public apology statement, acknowledging that they made a mistake in their security strategy trade-offs and will adjust the development of security measures to provide public warnings. If the system detects users attempting to build high-capability AI, the requests will be explicitly denied, or users will be redirected to a lower-capability model. Anthropic has cautioned that since public protection mechanisms are more susceptible to targeted bypassing, they will broaden the scope of security interception in the future, potentially resulting in the accidental blocking of some benign requests.
