April 1, 2026

Using Karpathy's Autoresearch Loop to Improve Screenshot to Layout

This is a developer log from the rebuild of Screenshot to Layout 2.0 — a Figma plugin that converts screenshots into editable layouts.

A few weeks ago I came across Andrej Karpathy talking about the "automated research loop" — setting up an AI agent to run experiments, evaluate results, and iterate on its own. Not replacing the human, but compressing the feedback cycle so you can make dozens of improvements in the time it normally takes to make a few.

We're in the middle of rebuilding Screenshot to Layout from the ground up, and I thought: what if we applied that loop to our pipeline?

How It Works

The plugin takes a screenshot and turns it into real Figma layers — text, rectangles, lines, colors. Improving the pipeline that does this is normally slow: change code, rebuild, run on a test screenshot, squint at the result, repeat. Maybe five or six iterations in a focused session.

With the loop, Claude makes a change, runs it through the actual plugin, and we look at the Figma output together. Each iteration gets its own number and its own Figma page. We went from mut_001 through mut_067.

The Trap

Our quality score went from 93.7 to 99.7. Sounds great. But around mutation 55, we noticed the scores kept climbing while the actual Figma output wasn't improving at the same rate. Classic Goodhart's Law — we'd started optimizing for the metric instead of the result.

The fix was simple: the only valid test is running the actual plugin and looking at the Figma output. No separate test harness. No synthetic scores. You run the real build, compare side by side, judge with your eyes.

This sounds obvious, but when you're deep in an automated loop and the numbers are going up, it's easy to forget that you're building a visual tool — and designers judge with their eyes, not with a score.

Takeaways

The loop works. 67 iterations in a compressed timeframe would have been impossible manually.

Keep the human in the loop. AI is great at generating code changes and running experiments. It's not great at judging whether a layout "looks right."

Metrics are guides, not goals. The moment your tests diverge from your product, you're optimizing for the wrong thing.

Leave a trace. A Figma page per iteration lets you scrub through the history like a flipbook. It's version control you can actually see.

We're continuing the loop — every change validated against real plugin builds, working toward handling complex app screens with inputs, dialogs, and grid layouts.

Want to see the results for yourself?

Try FigOCR today