Updates

Lessons and insights from building software with AI agents

I design and operate a 21-agent AI development pipeline that ships real software across 5 projects. These posts are the engineering journal — what worked, what broke, and what I’d do differently. If you manage teams or build systems, most of these lessons apply whether your team is human or not.

Start Here

Why I Coordinate 21 AI Agents Instead of Using One

Pipeline

Most people use AI as a single assistant. I split the work across 21 specialized agents — each with defined responsibilities, quality gates, and a communication relay. Here’s why, and what it actually looks like in practice.

Outgrowing a Paid API in One Afternoon

MITM

I replaced a paid mapping API with a free open-source alternative in a single session — 21 new tests, zero regressions. The decision framework that made it obvious was more valuable than the migration itself.

Teaching a Phone to Read Your Grandmother’s Recipe Card

StockPot

StockPot’s OCR pipeline went from “works on printed text” to “reads handwritten recipe cards in bad lighting.” The journey involved dual OCR engines, OpenCV preprocessing, and the realization that the hardest part of recipe capture isn’t text recognition — it’s knowing what to do with the text once you have it.

270 Pull Requests and What I’d Build Differently

Pipeline

After 270+ merged PRs across 5 projects, the pipeline works. But if I started over tomorrow, I’d change three things on day one — and none of them are about the AI models.

Why I Stopped Giving My AI Engineer a Quality Checklist

Pipeline

I added 20 quality rules to the engineer’s instructions. Quality got worse. Replacing the checklist with five specific test descriptions from the architect cut per-PR cost by 31% and produced cleaner code. Specificity beats volume every time.

From Speed to Scale: How Specialized Audits Changed Everything

Pipeline

I ran my first code audit the obvious way: one AI agent, one giant codebase, a generic prompt. The findings were useless. “Consider adding error handling.” Thanks. I already knew that.

When Your AI Agents Forget How to Be Themselves

Pipeline

Sessions 1 and 2 worked perfectly. Session 3, with no code changes, fell apart. Tracing the root cause cost 200,000 tokens — and changed how I think about AI memory.

When Your AI Writes Code That Can’t Actually Run

QuantBot

QuantBot had 668 passing tests across a complete quantitative trading system — and it couldn’t connect to a broker. An architect assessment revealed three critical modules were entirely stubbed with NotImplementedError. Tests validate logic, not integration.

The Debugging Tax: What Breaks When 21 AI Agents Share a Codebase

Pipeline

A relative path bug took two agents and three test runs to find. A prematurely shut-down reviewer left a PR unreviewed for an entire session. And GitHub won’t let you approve your own PR, even via API. Welcome to multi-agent coordination.

What 10,000+ Automated Tests Taught Me About AI Code Quality

Insights

AI writes tests fast. After 10,000+ tests across five projects, I can tell you exactly what they’re missing — and it’s always the same thing.

Building a $0/Month Production App with Cloudflare

MITM

Meet in the Middle — a collaborative meeting point finder built with Svelte — runs entirely on Cloudflare’s free tier. Pages for hosting, Workers for API proxying, D1 for caching, R2 for map tiles. Total monthly cost: zero dollars.

The Pipeline That Ships Features While I Sleep

Pipeline

I want to be honest: the pipeline doesn’t run while I sleep. It runs while I make decisions, which is harder than it sounds and more valuable than writing code.