Artificial Intelligence 2 min June 26, 2026

Leaked OpenAI o3 Benchmark Scores Spark Debate Over AI Performance Claims

Artificial Intelligence

+154

A fresh wave of buzz is circling the AI world after alleged internal benchmark scores for OpenAI’s unreleased o3 model surfaced on X. The leaked screenshots reportedly show striking results, including a claimed 92% on MMLU and major gains in reasoning tasks. If accurate, the numbers would suggest another meaningful step forward in large language model capability.

But as with many viral AI leaks, the response has been split between excitement and skepticism. Researchers and observers are already scrutinizing the images, looking for signs of manipulation, missing context, or benchmark cherry-picking. In the current AI landscape, benchmark results can be impressive without telling the full story about real-world reliability, robustness, or safety.

That tension is part of why this leak has gained so much traction. OpenAI has not publicly confirmed the numbers, and without official documentation, it remains unclear how the scores were produced, what evaluation settings were used, or whether they reflect the model’s broader performance. In other words, the screenshots may hint at strong progress, but they do not yet prove a breakthrough.

For now, the bigger story is how quickly the AI community reacts to benchmark leaks. Every new score is treated as a clue in the race toward more capable models, but the smartest takeaway is to wait for verified results. If o3 is as strong as the leaked numbers suggest, OpenAI will likely want to reveal that story on its own terms.

#OpenAI#AGI#AI

0 9s əvvəl

Leaked OpenAI o3 Benchmark Scores Spark Debate Over AI Performance Claims

BlogComments.title