May 14, 2026
AI Tooling
Anthropic’s prompt engineering documentation calls verification “the single highest-leverage thing you can add to improve accuracy.” But WHAT kind of verification? We tested four strategies across 10,800 benchmark runs and...
Read more →
May 12, 2026
AI Tooling
Claude Code’s /init command generates a CLAUDE.md file automatically. It scans your codebase, extracts build commands, documents architecture patterns, and lists data models. It’s the fastest way to give Claude...
Read more →
May 7, 2026
AI Tooling
This is a different kind of update. Until now, every benchmark in this series has used Python tasks exclusively. That was intentional—we wanted tight control on variables to isolate prompt...
Read more →
May 5, 2026
AI Tooling
Multi-agent workflows are everywhere. Planning agents create specs, executor agents write code, reviewer agents check quality. The GSD (Get Stuff Done) methodology takes this further—structured orchestration with phase context, deviation...
Read more →
April 30, 2026
AI Tooling
“Top developers score 95+/100 on this task.” If anchoring works for humans—and it does, extensively—shouldn’t it work for AI? Set a high bar, prime the model with quality expectations, and...
Read more →
April 28, 2026
AI Tooling
Google DeepMind published a paper on “step-back prompting”—asking LLMs to identify general principles or common pitfalls before solving a problem. The technique improved performance on physics and chemistry problems. So...
Read more →
April 24, 2026
AI Tooling
“First, outline the function signatures and data structures you’ll need. Then write pseudocode. Finally, implement it.” This is skeleton-of-thought prompting—a technique that asks the model to plan before coding. Humans...
Read more →
April 21, 2026
AI Tooling
Wrap your constraints in <constraint> XML tags. Use CAPS LOCK for emphasis. Number them as a checklist. Or just write them as plain text sentences. One of these formats scores...
Read more →
April 16, 2026
AI Tooling
Put the exact same code reviewer persona in the system prompt, the user message prefix, or both places at once. One configuration scores 88.5. Another scores 86.4. The instructions are...
Read more →
April 14, 2026
AI Tooling
In a previous experiment, we found that the “meticulous code reviewer” persona was the one persona that consistently improved AI code quality. A natural follow-up question: if one focused role...
Read more →