How to hold agentic coding tools

The debate for agentic coding has moved in the past year from from leveraged coding (cursor tab) to agentic building. the popular coding CLIs - in contrast to the previous generation of vscode forks - are primarily from the big labs.

Coding agents work very differently

While cursor tab is pretty intuitive to use, coding CLIs are the polar opposite. They need to be held right to work effectively, and I don’t think there are any good docs out there how to do that.

Choose one, try many

Just pick one, and if you can’t decide which one, choose the one whose logo you like the most. Coding agents all act very different but in sometimes subtle ways. In the same way people you work with are very different in subtle ways, you can’t judge who you’ll prefer working with based on a photo.

So just pick one to start, and try others.

Your colleague might be telling you that codex is the best agent there is, which could be true for the way they think or the work they do. It migth not for you though, so just stay open minded.

Start vanilla and find a good workflow

Coding agents are pretty powerful out of the box and cover a large surface area, so you’ll get value without even setting up skills, MCPs, git worktrees, … All of these have their benefits, but they come with extra complexity, and might mask the “personality” of your coding agent.

Dont try to start parallelizing agents if you haven’t yet found a workflow to get good results out of a single one. If one agent gets you bad results, 8 will get you 8 times as much bad results.

Write an AGENTS.md

Think of your AGENTS.md as a map for your agent. You’re just sketching out the rough area we’re operating in, and possibly highlight a few interesting points. Describe how your codebase is structured, what patterns are being used, which package manager to use, or anything else it has gotten wrong before.

Just tell your agent after a session to update it for you with anything it has learned - it works well.

Use agent zoom, and start zoomed in

Here’s what I mean: You can ask your agent to “Build a todo app”, or you can ask your agent to “create a react component Todo.tsx, in which we use Mantine for UI and localstorage for persistence, …”. You can choose how many constraints you give your agent, and with that increase/decrease both the scope of the task but also the judgement calls.

I highly recommend starting zoomed in, and being very explicit about what you’re expecting. I’ll give the highest chance of success, and will avoid it to misinterpret. Over time, try zooming out and see where it goes wrong. In my experience that differs quite signficicantly between agentic coding tools, and this is where you migth want to try others to see if you like their intuitive take more.

Large codebases are hard.

In large codebases in my experience these tools struggle often with reusing existing functions, and having the right “mental model”/architecture of the application in mind. That’s no surprise. Us humans over time build up a mental map of a repo, with fuzzy memories of “didnt I see this util before”. Agent’s start from scratch, and that’s not solved right now.

This will be one of the most interesting areas of improvement.

Feedback loops

Now this is somewhat obivous but also not super easy to do. If your agent can validate it’s work after being done, it can fix it’s mistake itself. The problem of course is what’s “correct”. A simple step is running the typescript compiler, but obviously that doesnt mean the feature is implemented correctly. A better proxy of what’s correct in your mind might be using the playwright MCP and letting it open the browser. That often ends up a bit more fiddly than it sounds, but can be good.

It is definitely worth considering this, because if you have a good feedback loop, agent’s become magical.

CLIs are better

This is more my personal opinion than the rest of takes, but CLIs have a major advantage compared to graphical interfaces: They run in a shell.

Shells are extremly good agent environments. They are

  • composable: pipe output from one command into another
  • self describing: add --help and get up-to-date docs for the tool you’re using
  • get a vast array of tools for free: your agent can grep stuff, sed stuff, write python scripts, …

Now this is technically possible in a graphical interface, but it’s just not as expected. In a terminal I know which env variables are available, which shell it’s using and that it’s the ame shell I’m using, …

What do you want to build?

This one is critical. You can have 20 git worktree’s running claude code in parallel, if you dont know what you want to build.

Code, for us humans, has a few functions all at once that we seem to be forgetting: It’s also a way to structure and test our thoughts. It’s the same reason people like to take notes - to structure their thoughts.

So at the end we need to know what to build, because otherwise I might be building a whole lot of zero value.

This is by far the biggest bottleneck for me right now. That doesnt mean it’s a bad thing. I can try a lot more ideas today and try building more risky things, simply because implementation and opportuntiy cost has gone down.

But it’s the part of the process that I’m thinking the most about at the moment.

Summary

So if you have tried using agent CLIs before and were disappointed, try them again. In particular, try different ones. All of this will likely be outdated within a few months, but I think we live in incredibly exciting times allowing to build things we couldn’t even dream of 5 years ago. To the future.

Written by Jonas Otten

LinkedIn GitHub Twitter