Posts

2026.FEB.27

Two Beliefs About Coding Agents

Drew Breunig:

I'm lucky enough to talk to a range of developers and teams, spanning a variety of company sizes and a broad array of skill sets. From these conversations, two beliefs have emerged and solidified about coding agents and their (current) impact on coding.

Drew makes two very astute observations, both of which I endorse. The first one in particular is under-appreciated:

Most talented developers do not appreciate the impact of the intuitive knowledge they bring to their coding agent.

Coding agents are amplifiers of skills of the engineers that wield them, they are not magic beans that'll let an amateur cook up a compiler.

The second observation should be obvious to anyone who has built software products, but somehow the current mania is making people ignore it:

Most work people are sharing are incredible personal tools, but they are not capital-P products.

2026.FEB.18

Codex CLI vs Claude Code on Autonomy

nilenso:

I spent some time studying the system prompts of coding agent harnesses like Codex CLI and Claude Code. These prompts reveal the priorities, values, and scars of their products. They're only a few pages each and worth reading in full, especially if you use them every day. This approach to understanding such products is more grounded than the vibe-based takes you often see in feeds.

While there are many similarities and differences between them, one of the most commonly perceived differences between Claude Code and Codex CLI is autonomy, and in this post I'll share what I observed. We tend to perceive autonomous behaviour as long-running, independent, or requiring less supervision and guidance. Reading the system prompts, it becomes apparent that the products make very different, and very intentional choices.

Very interesting comparison. But I don't believe the difference in the behaviour is primarily, or even likely, driven by the system prompts. The difference is far more ingrained, most likely RL'd during post-training.

Why do I say this? I've been using both the models in Pi coding agent with its default system prompt1, which is both really small and the same for all models. And even in Pi, this difference in behaviour comes across clearly.2

Footnotes

  1. Pi allows us to replace the entire system prompt by placing a markdown file at ~/.pi/agent/SYSTEM.md

  2. I feel that the models both behave better in Pi than in their respective canonical harnesses; but this is a very subjective opinion.

2026.FEB.16

SaaS Isn't Dead. It's Worse Than That.

Michael Bloch:

I'm more bullish on AI than I've ever been. And that's exactly why I'm bearish on most software companies. Not because their customers will leave, but because their next thirty competitors just got a lot easier to build.

I've seen/heard a bunch of different people quip exactly this. This is one of the crispest articulations. Rings ominous to me.

2026.FEB.14

The Final Bottleneck

Armin Ronacher:

I too am the bottleneck now. But you know what? Two years ago, I too was the bottleneck. I was the bottleneck all along. The machine did not really change that. And for as long as I carry responsibilities and am accountable, this will remain true. If we manage to push accountability upwards, it might change, but so far, how that would happen is not clear.

I too am the bottleneck. And I'm glad I am. When I stop being the bottleneck, I'm no longer involved at all. And if I'm not involved, it doesn't matter to me.

A very good and thought-provoking read.

2026.FEB.11

Showboat and Rodney — Agents Demoing Their Work

Simon Willison:

A key challenge working with coding agents is having them both test what they've built and demonstrate that software to you, their overseer. This goes beyond automated tests—we need artifacts that show their progress and help us see exactly what the agent-produced software is able to do.

Simon's response to this challenge is two CLIs, Showboat & Rodney.

Showboat:

It's a CLI tool (a Go binary, optionally wrapped in Python to make it easier to install) that helps an agent construct a Markdown document demonstrating exactly what their newly developed code can do.

This might be a very useful artefact to include in PRs (assuming they are supposed to be reviewed by humans of course!)

Rodney:

Rodney is a CLI tool for browser automation designed to work with Showboat. It can navigate to URLs, take screenshots, click on elements and fill in forms.

Rodney is quite interesting too. There are a few such CLIs/skills for agents to control browsers for testing out there: Vercel's agent-browser seems very popular, but there are a few others as well on skills.sh.

I'm currently using web-browser skill from mitsuhiko on GitHub, which has a set of typescript scripts that control a Chrome browser using CDP (similar to Rodney); it has no npm dependencies save for one websockets lib. This works well, but I'm going give Rodney a try because, being able to run using uvx means that it should work well in environments like Codex for web (which has uv and Chrome) without additional setup.

2026.FEB.09

Feedback Loopable

New term added to my vocab that I'm going to use a lot: "feedback loopable".

Lewis Metcalf:

Agents are most powerful when they can validate their work against reality. When they have feedback loops. The problem is, some work is hard to organize in a way that an agent can easily get feedback. The software we build and the tools we use are built for humans. Humans with eyeballs and hands and fingers.

This article is about how to make those problems easier for agents. It's a way to create an environment for your agents so that they can solve problems on their own, and so that you (the human) can intuitively guide them without getting in the way.

This process of building something for humans using methods built for agents is what I call: making it feedback loopable.

2026.FEB.09

Mitchell Hashimoto's AI Adoption Journey

This is a good post to share with anyone who is still sceptical about agentic engineering. Mitchell Hashimoto (creator of Vagrant, co-founder of HashiCorp, now building Ghostty) goes through his journey from "this isn't really helpful at all" to "it consistently adds value."

Instead of giving up, I forced myself to reproduce all my manual commits with agentic ones. I literally did the work twice. I'd do the work manually, and then I'd fight an agent to produce identical results in terms of quality and function (without it being able to see my manual solution, of course).

This was excruciating, because it got in the way of simply getting things done. But I've been around the block with non-AI tools enough to know that friction is natural, and I can't come to a firm, defensible conclusion without exhausting my efforts.

What's noteworthy is that it doesn't happen naturally for Mitchell — he had to put in explicit effort into making this work and he didn't give up when it wasn't working all too well. That may be the big difference between those who are excited about agentic engineering vs not.

2026.JAN.07

You can make up HTML tags

Browsers handle unrecognized tags by treating them as a generic element, with no effect beyond what’s specified in the CSS. This isn’t just a weird quirk, but is standardized behavior. If you include hyphens in the name, you can guarantee that your tag won’t appear in any future versions of HTML.

This is so cool, and I had never heard of this before. I wonder why this is not more popular — really semantic HTML!

2025.AUG.02

The Bitter Lesson versus The Garbage Can

A thought-provoking article that, on the surface, explores which modality of AI agent deployment is more likely to succeed in a large organisation — agents carefully designed around organisational processes, or general-purpose agents trained to seek successful outcomes (RL, for example).

But dig a little deeper, and it raises a more fundamental question: what shape will successful AI-powered products take?

Ethan Mollick:

For many people, this may not be a surprise. One thing you learn studying (or working in) organizations is that they are all actually a bit of a mess. In fact, one classic organizational theory is actually called the Garbage Can Model. This views organizations as chaotic “garbage cans” where problems, solutions, and decision-makers are dumped in together, and decisions often happen when these elements collide randomly, rather than through a fully rational process.

Computer scientist Richard Sutton introduced the concept of the Bitter Lesson in an influential 2019 essay where he pointed out a pattern in AI research. Time and again, AI researchers trying to solve a difficult problem, like beating humans in chess, turned to elegant solutions, studying opening moves, positional evaluations, tactical patterns, and endgame databases. Programmers encoded centuries of chess wisdom in hand-crafted software: control the center, develop pieces early, king safety matters, passed pawns are valuable, and so on. Deep Blue, the first chess computer to beat the world’s best human, used some chess knowledge, but combined that with the brute force of being able to search 200 million positions a second. In 2017, Google released AlphaZero, which could beat humans not just in chess but also in shogi and go, and it did it with no prior knowledge of these games at all. Instead, the AI model trained against itself, playing the games until it learned them. All of the elegant knowledge of chess was irrelevant, pure brute force computing combined with generalized approaches to machine learning, was enough to beat them. And that is the Bitter Lesson — encoding human understanding into an AI tends to be worse than just letting the AI figure out how to solve the problem, and adding enough computing power until it can do it better than any human.

ai