We built #AI to be confident. Turns out confidence and accuracy parted ways.
New research from OpenAI shows that training models to say “I don’t know” cuts hallucinations sharply. The problem was never capability- it was incentive.
AI models ‘learned’ that guessing gets rewarded more than abstaining.
Meanwhile, Thinking Machines Lab documented how to make AI boringly predictable. Not just setting temperature to zero, but controlling every random seed, every batch, every library call. Same input, same output, always.
Both reach the same conclusion from different angles: hallucinations aren’t a bug in the technology. They’re a feature of how we trained it.
The fix appears to be embarrassingly human: make “I don’t know” an acceptable output.