Your job is to deliver code you have proven to work

Here's the pattern Simon Willison keeps seeing that drives him nuts: junior engineers dump massive, untested AI-generated PRs on their coworkers or open source maintainers and call it "code review." This is a waste of everyone's time and a dereliction of duty. The shift from "I write code" to "I deliver code that works with proof" is the actual job now that LLMs can crank out implementations. Your responsibility isn't to prompt Claude into generating a thousand lines. It's to include evidence that what you're submitting actually does what you claim.

Proving code works has two non-negotiable steps. First is manual testing. If you haven't watched your change do the right thing yourself, you're hoping for luck. Get the system into a known state, exercise the change, confirm it worked. Then show that proof: paste terminal commands and output into the PR comment, record a screen capture if it's visual, demonstrate the edge cases you tried. Senior engineers find the things that break. Junior engineers hope nothing breaks and make it someone else's problem. The second step is automated testing. Write tests that would fail if someone reverted your implementation. This proves your change works and keeps working. The patterns here mirror manual testing: set up initial state, exercise the change, assert it worked. With LLMs making test writing trivial, there's zero excuse for skipping this.

The explosion of coding agents like Claude Code and Codex CLI in 2025 changed the game because these tools can execute code and iterate on problems. But you still need to teach them to prove their work. Have them manually test as they build, take screenshots for CSS changes, write automated tests that follow your project's existing patterns. Good agents already extend test suites without being asked if you have tests in place. This is why keeping test code well-organized matters: agents reuse your patterns, so clean test code produces clean test code.

The accountability sits with you, the human. A computer can never be held accountable. Anyone can generate a massive patch with an LLM. What's valuable is proven working code. Before you hit submit on that PR, make sure you've included your evidence. Otherwise you're just shifting the actual engineering work onto whoever has to review your mess.

Your job is to deliver code you have proven to work

Problems this helps solve:

Explore more resources