Why Doesn't the LLM Get It Right the First Time?

01-18-2026 - Hamburg, Germany
Originally written 98% by a human, then translated from German with AI. Share
Diese Notiz ist auch auf Deutsch verfügbar.

LLMs almost always find bugs in their own code when asked for a code review. Sounds paradoxical. Why did they make the mistake in the first place? The answer lies in a fundamental principle: verification is easier than generation.

A recent ICLR study¹ gave the phenomenon a name: “Generation-Verification Gap.” Spotting errors is fundamentally easier than writing error-free code, and this gap grows with model size. By the way, this isn’t just true for LLMs. It’s true for all of us. That’s why editors exist, why we do code reviews, why we seek second opinions. Producing and checking are different cognitive tasks, and the latter is less demanding.

The practical hack: have a second agent review the output, ideally a different model. Low cost, systematically better results.

Huang et al., Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models , ICLR 2025 ↩︎