the day claude was bad

Tweeted in February: claude so bad this week, actually gotta code.

It was meant as a joke. The truth underneath is more useful.

A library I depend on shipped a new major version with subtle behavioural changes. Claude kept generating code that compiled and ran and was wrong. Not visibly wrong. Off-by-one wrong. The kind of bug where the tests pass because the tests were also generated to match the bug.

I spent two days untangling it. Would have taken thirty minutes a year ago when I was still typing the code line by line, because I'd have noticed the API change on the first call.

LLMs are great at writing code that looks like the rest of the code. They're not great at noticing when the rest of the code is suddenly the wrong reference. The pattern they've seen is the pattern they reproduce. If the pattern is now wrong, they reproduce wrong with confidence.

The skills you still need haven't changed. Read the changelog. Read the diff before accepting it. Write the test you'd write if no AI existed, then let AI fill in the implementation. Know your own code well enough to spot when the agent's version is subtly different. Be suspicious of confident output you didn't ask for.

The bad days reveal which skills the good days were hiding.

Any tool that makes most work faster will, at some predictable rate, make a small subset of work much slower. The total productivity gain is still real. The subset is also real, and the moment of slow work is the moment you most need the underlying skill the tool was supposed to replace. That's not an argument against the tool. It's an argument for keeping the underlying skill.

The mistake is treating AI like a calculator. A calculator never confidently returns the wrong answer. An LLM does, daily. Calibrating your trust to its tone instead of its track record is how the two-day bug happens.

The rule I landed on. If the agent's output is something I could have written in five minutes by hand, I trust it and move on. If it's something that would have taken me an hour, I read it the way I'd read a junior engineer's PR. Carefully. Asking why. Looking for the line that's almost right.

The savings are still real, just on the five-minute work, not the one-hour work. The one-hour work still takes me an hour. Some of that hour is now reading instead of typing, but the cognitive cost is the same.

Tool changes, fundamentals don't.

— Simon