Using AI to write better code more slowly

Q: 错误检测

LLM代理非常擅长发现代码中的错误。

Q: 错误优先级和验证

发现错误后，需要进行优先级排序和验证。

Hacker News Best

Hacker News Best2026年5月25日

Using AI to write better code more slowly

8.5内容质量

TL;DR · AI 摘要

使用AI编写高质量代码虽然速度较慢，但通过多模型审查可以有效发现并修复大量错误，提升代码库的整体健康状况。

核心要点

AI可以有效发现代码中的大量错误。
多模型审查可以减少误报率。
这种方法虽然速度慢，但能提高代码质量。

结构提纲

按章节快速跳转。

§引言
很多人认为AI编码是为了快速写出低质量代码，但实际上LLM可以用来缓慢地写出高质量代码。
·LLM的灵活性
LLM非常灵活，可以用于编写高质量代码。
·错误检测
LLM代理非常擅长发现代码中的错误。
·错误优先级和验证
发现错误后，需要进行优先级排序和验证。
·Claude技能
作者开发了一个Claude技能，结合多个模型来审查PR。
·工作流程
作者的工作流程包括修复关键错误、跳过不重要的错误和放弃有问题的PR。

思维导图

用一张图看清主题之间的关系。

查看大纲文本（无障碍 / 无 JS 友好）

使用AI编写高质量代码

金句 / Highlights

值得收藏与分享的关键句。

LLM代理非常擅长发现代码中的错误。
— 第 4 段
⬇︎ 下载 PNG 𝕏 分享到 X
多模型审查可以减少误报率。
— 第 7 段
⬇︎ 下载 PNG 𝕏 分享到 X
这种方法虽然速度慢，但能提高代码质量。
— 第 15 段
⬇︎ 下载 PNG 𝕏 分享到 X

#AI#代码审查#高质量代码

打开原文

A lot of people seem convinced that the point of AI coding is to write low-quality code as fast as possible. Spew out barely-passable slop, open massive PRs, and merge them unvetted. Ship it!

But the thing is, LLMs are very flexible. And you can use them just as effectively to write _high-quality_ code more _slowly_.

This statement seems completely obvious to me at this point, and I almost didn’t want to write this post for that reason. But there seem to be enough people convinced that LLMs are only good as slop cannons that it’s worth making the opposite case.

If Mythos taught us anything, it’s that LLM agents are _really good_ at finding bugs. Throw them at a codebase enough times, and they will find so many bugs that you’ll barely know what to do with them.

Like many others, I’ve also found this is true of non-Mythos models – some may be better than others at finding subtle bugs or avoiding false positives, but the fact is that the latest public models from Anthropic and OpenAI are good enough to find plenty of bugs in an unscrutinized codebase.

The problem is not so much _finding_ the bugs, but instead prioritizing and validating them. For this reason I have a Claude skill I adapted from this article‘s core insight, which is that the more, different models you throw at a PR review, the less likely you are to get hallucinations or bogus bugs.

The skill says (paraphrasing):

Run a Claude sub-agent, Codex, and Cursor Bugbot to find bugs in this PR ranked by critical/high/medium/low. Once they’re all done, review their findings, do your own research to rule out false positives, and write a final report.

That’s basically it. You can add your own definition of “bug” if you want – mine has stipulations about the KISS and DRY principles, writing accessible HTML/JSX, using proper indexes for SQL queries, etc.

In my experience, this skill always finds tons of bugs in a PR, and the false positive rate is near zero. It finds so many bugs that you’ll be bored senseless if you try to tackle them all. They’ll range from critical security or correctness bugs to the more mundane medium-level perf bugs to low-level “this comment is misleading”-type bugs.

My typical workflow is:

Have an agent fix all the criticals and highs (with my guidance on the proper solution), then repeat until no criticals/highs
Skip highs/mediums where the juice isn’t worth the squeeze (e.g. 100 lines of code to fix a narrow edge case)
Abandon the PR if it has so many criticals that I realize the whole approach is misguided

When I use this technique, I haven’t necessarily seen my velocity go up. If anything, the review process often finds _pre-existing_ bugs, so I end up on a tangential side-quest where I’m writing unit tests and fixing subtle flaws that pre-date the PR. This is the opposite of the “10x productivity” slop-cannon style of development that most people imagine when they think of vibe coding, but I find it very satisfying.

It’s a great way to improve the overall health of the codebase while also teaching you about the odd corners of it. In my experience, the happy-path of a complex architecture is less interesting than its failure modes. And pre-LLMs, this is usually how I got familiar with a codebase anyway: understanding where the assumptions break down, and then getting my hands dirty to fix it.

If you’re the kind of person who is skeptical that AI coding is good for _anything_, then I doubt this post will persuade you. But if you’re the kind of developer who uses agents to write multi-hundred-line PRs that you barely understand yourself, I’d invite you to slow down a bit and try this other, slower style of “vibe coding.” Ask an agent how your PR works and how it might fail. Have it write Markdown docs with Mermaid charts if necessary. Use Matt Pocock’s `/grill-me` skill until you understand the entire PR front-to-back.

You might not be more “productive” in terms of raw lines of code. You might burn a ton of tokens just to find out that your entire plan was wrongheaded from the start. But I find this style of coding to be a more super-powered version of the kind of programming I was already trying to do before LLMs: careful, methodical, quality-obsessed, focused on making things better for the next coder.

So take a deep breath, slow down, try this technique, and see if you don’t enjoy writing better code more slowly.