Claude Just Gave You a Dry Promotion, and Will Keep Promoting You Until…

A designer sketches an irrigation plan at a field table while machines dig the channels and a shovel sits unused

So you have stopped writing most of your code. You still ship it. You still own it. You just don’t type it anymore.

You write a spec, point a few agents at it, and review what comes back. You add acceptance criteria and verification mechanisms to the spec, so the agents know to continue iterating until they get it right. You just keep an eye on the direction and avoid drift. You get from idea to working product in minutes now.

That’s actually a promotion. Nobody gave you a new title or a raise; your job just changed from writing, refactoring, and running the code to directing a group of agents that now do it for you. It’s a dry promotion with no extra pay, and you never applied to it.

Kent Beck says the watchmaker era is gone, the careful by-hand changes to code are mostly the agent’s job now. The industry already has a name for the new role, the agent manager. In February it was a prediction in HBR, by mid-2026 people are calling it the career skill of the year.

Taste

Researchers in a field lab analyze soil cores, water samples, and plant leaves to judge what healthy looks like

Strip away the typing and what’s left is your taste, your read on what good looks like, which technical decisions hold up under real load and which ones quietly turn into a 3am page six months later. You built it over years, making those calls and living with the consequences.

Taste was never really in making the call. Think about the decisions you used to agonize over by hand, which pattern to reach for, how to structure the module, which library to pull in. Those feel like the craft, but they’re educated guesses. Very educated, built on years of scars and a shelf of good books and every system you’ve watched scale or fall over, but still guesses. There’s rarely one right answer, there are a few reasonable approaches, and you don’t find out which one was right until the thing runs in production, under real load, during a real incident.

Kent Beck put it bluntly, most of his coding skills are worth zero now, the “how to build it” part, while his taste, knowing what’s worth building and whether it’s any good, gained leverage. You found out whether your architecture was any good when it got paged at scale, never the moment you typed it. So when an agent picks the pattern instead of you, your opinion at the pattern level matters less than it feels like it should. What counts is knowing what good looks like, and being able to tell whether you got it.

You Can’t Decline It

A lone worker leans on a shovel by one hand-dug furrow, facing a vast plowed field he can't dig by hand

This is the Peter Principle, you get promoted until you land in a job you’re not good at, and AI just promoted you past the work that built your taste. Maybe you’d rather not manage, maybe you got into this to write code, not review it, the math doesn’t give you the choice. Anthropic’s data shows engineers can fully hand off only 0 to 20% of their tasks, the rest they supervise, and better models make that worse because faster agents produce more output and more output is more to review. People call it the oversight gap. Overseeing isn’t optional, and it keeps growing.

The catch is how you oversee. It used to mean reading the code, that’s where you applied your taste, line by line. At the volume agents produce you can’t read it all. Read every PR they open and you’re back to the speed you had before the promotion, which kills the point. Skip the reading and trust the output, and your judgment slowly rots, Anthropic ran a study where junior engineers who leaned on AI to get unstuck scored 17 points lower on the same skills later, worst of all in debugging, the exact muscle you need to tell good output from output that only looks good.

Both ways lose, that’s the bind.

Put Your Taste in the Evals

A designer drafts detailed blueprints of irrigation control gates, the checkpoints water must pass through

So you write it down. You take your read on what good looks like and bake it into evals, the checks that tell you whether the agent’s work is any good without you reading every line. Test coverage, behavior under load, how cleanly it’s split into parts, all the way up to whether it does what the spec actually asked for. You define what good means for this system, you build something that measures it, and you let it run against everything the agents produce.

OpenAI and Thrive published a tax-prep agent in May. When a human tax preparer corrects the agent’s output, the correction gets captured as structured data, and the corrections that keep recurring turn into evals, concrete checks the system has to pass. Codex then works against those evals to fix itself. Production becomes the signal that drives the improvement, and the engineers stay on the hook for architecture, product decisions, and shipping. The humans moved their judgment up into the evals, and the system got measurably better from one tax season to the next, drafting returns at up to 97% accuracy and lifting throughput by about half.

Evals are hills for the agent to climb. You give it a clear, measurable target and it climbs. That target now encodes your taste, and once it’s written down it runs against more code than you could ever review by hand. Anthropic’s more recent data points the same way, the people getting the most out of AI are the ones staying engaged, augmenting instead of fully automating, and they score higher for it.

I’m not certain this works. Designing evals is my best guess at how you keep your taste while the volume explodes, and it’s an early guess in genuinely new territory. It’s the move I’m betting on, and so far it’s holding.

What Stays Human

This won’t stop here. You went from writing code to directing agents, and the abstraction keeps rising, each jump hands another layer to the agents and pushes you up a rung. (Who becomes a senior when the bottom rung is all agents is a separate problem.)

Two things don’t move, no matter how high this goes. The first is intent, deciding what’s worth building and why. That comes from being a person in the world with things you actually want and need, the same reason AI can’t have shower thoughts, it doesn’t want anything. The second is accountability. When the agents answer the pager and patch production while you sleep, you’ll still be the one who answers for whether there were too many incidents and whether they got fixed fast enough. The pager can move to the agent, the responsibility can’t.

And I’m sorry, dear reader, I don’t have the answer for when the promotions will stop. Nobody does yet, and anyone who claims they do is guessing.

What I’m hopeful about is where the ceiling sits. I think the promotions stop climbing when they hit the two things that stay ours, our ingenuity, the part that decides what’s worth doing, and our ownership, the part that answers for it. Machines can take the rung below you forever, they can’t take the wanting or the responsibility. So we keep getting promoted, rung after rung, until we’re standing on the floor that’s ours. I don’t know how many rungs are left before we get there. I’m just hopeful that floor holds.