Well Engineered Tech - Blog

Why Doesn't the LLM Get It Right the First Time?

- Hamburg, Germany

Diese Notiz ist auch auf Deutsch verfügbar.

LLMs almost always find bugs in their own code when asked for a code review. Sounds paradoxical. Why did they make the mistake in the first place? The answer lies in a fundamental principle: verification is easier than generation.

A recent ICLR study1 gave the phenomenon a name: “Generation-Verification Gap.” Spotting errors is fundamentally easier than writing error-free code, and this gap grows with model size. By the way, this isn’t just true for LLMs. It’s true for all of us. That’s why editors exist, why we do code reviews, why we seek second opinions. Producing and checking are different cognitive tasks, and the latter is less demanding.

The practical hack: have a second agent review the output, ideally a different model. Low cost, systematically better results.