According to Dynamic Yield monitoring, Altman discussed synthetic data in a podcast with The Atlantic's CEO Nicholas Thompson. Thompson mentioned that AI-generated content is ubiquitous online now, with even humans learning to write in the style of AI. He believes that in the future, models will inevitably be exposed to AI-generated data. He stated, "GPT-4 is the last model that hasn't heavily relied on AI-generated data," with Altman nodding in agreement.
Thompson directly asked: Has anyone trained a model entirely on synthetic data (using AI's output to train the next generation of AI)? Altman paused for a moment and said, "I'm not sure if I should say it." This statement can be taken as an admission. He continued by emphasizing that the core of the model is to learn reasoning, a task that can be entirely achieved with synthetic data. Using a mathematical analogy, he questioned: Can a model that has never seen human data perform better than humans at calculations? "I think it can." However, he doubted whether a model that has not been exposed to human culture can understand human values. "That's probably not possible."
Synthetic data has always been likened to "mad cow disease": AI constantly feeding on its own output, leading to potential degradation and deterioration of information. According to Altman, teaching AI mathematics does not require human input, but teaching AI to understand humans does.
