Your engineering org needs an AI slop registry

Wait 5 sec.

AI coding tools don’t just help engineers write code faster. They help engineers make the same mistake faster, at scale, across every PR that touches a given pattern. I’m not talking about AI code that’s obviously wrong; I’m talking about code that compiles, passes basic checks, and looks plausible but is subtly wrong, bloated, or misaligned with what was actually needed.In practice, that usually looks like AI overengineering the abstraction layer for a problem that needs 10 lines, code that ignores your repo’s patterns, naming, or architecture, calls to APIs that don’t exist, or copying patterns without understanding why, like retry logic where it’s not needed.Errors like that are systematic, which is what makes them preventable.You have CLAUDE.md and Skills, but…Most teams respond to this by trying to give the AI better instructions. They document their standards in a CLAUDE.md file, configure Skills, and describe the conventions they want the model to follow. This is the right impulse, but it doesn’t always work. They’re asking the same non-deterministic agent that generated the code to also catch its own mistakes. It may follow the rules you’ve written. It may not. There’s no evidence either way, no audit trail, and you can’t know in advance which run you’ll get. A CLAUDE.md file is an input to generation. It is not a verification system.“A CLAUDE.md file is an input to generation. It is not a verification system.”Catching slop reliably requires something structurally separate: a system that independently checks the output, uses a different agent, and produces the same result every time it sees the same code.The two layers of verificationThe shift we’ve been working toward at Aviator is replacing code review with verified intent. Instead of a reviewer reading a diff and asking, “Does this look right?” the team agrees on what the code is supposed to do before it’s written, and a separate verification system checks the output against that agreement.Think of it like a building inspection. A building isn’t approved by an architect watching every nail get hammered. It’s approved by an inspector evaluating the finished structure against the blueprints. Intent-driven verification follows the same pattern: the spec is the blueprint, the agent’s implementation is the construction, the verifier pipeline produces verdicts and evidence for each criterion, and the reviewer approves based on intent fit and evidence quality.“Instead of a reviewer reading a diff and asking, ‘Does this look right?’ the team agrees on what the code is supposed to do before it’s written.”The model has two layers, and understanding why there are two rather than one is the key to making it work.User criteria are the acceptance criteria for a specific change, generated by the agent from the expressed intent or written by hand. They’re scoped to this PR only. The endpoint path, the response shape, the behavior under failure, and what’s explicitly out of scope. This is where the intent for a particular task lives.Invariant criteria come from the team’s Invariants catalog and are rules that automatically apply to every matching change. Where user-supplied acceptance criteria describe what this change should do, invariants describe what every change should respect. They live in your account and update once for everyone.Your Invariants should be specific about the rule but vague about the implementation:All HTTP handlers must call an authentication middleware before any business logic.“All migrations must declare a down block.”These are defined once and checked on every run. Developers don’t need to include them in the specs because the system automatically loads the matching set.The test for promoting a check to an invariant is recurrence: anything that you post in a review comment multiple times should become an invariant. Aviator actually does this automatically. It auto-creates invariants based on past comments.When verification runs, both layers are assembled into a single list of acceptance criteria and flow through the same pipeline. A spec adding a subscription status endpoint might contain these user criteria:# Add subscription status endpoint## Acceptance Criteria– [ ] Endpoint: GET /api/v1/subscription/status– [ ] Response includes: status, renewal_dateThe invariant catalog then adds its own criteria automatically, say, a rule that all HTTP handlers must use AuthMiddleware. Verification checks all of them:✓ Endpoint exists at the correct path (user criterion)✓ Response includes status, renewal_date (user criterion)✓ Handler uses AuthMiddleware (invariant)All must pass. The spec author didn’t need to remember the authentication requirement. It was enforced by the catalog without anyone asking for it.Invariants as the anti-slop registryInvariants are what we call the ‘anti-AI slop registry,’ and that makes this work at scale. They address the most common category of AI slop: convention blindness, deprecated APIs, module boundaries the model doesn’t know about, and security baselines that should apply everywhere. None of these are in the model’s training data for your specific codebase. They live in the heads of your senior engineers and show up as recurring review comments.Most invariants worth writing start as a review comment that’s been left more than twice. Here is an example of turning a real review comment into an invariant:Comment on PR #4173:“Please don’t write to users directly — go through UserRepository.UpdateProfile. We had a partial-write bug last quarter from a similar pattern.”Invariant body:CopyWrites to the users table must go through UserRepository. Direct INSERT,UPDATE, or DELETE statements against the users table are not allowedoutside the repository package. Schema migrations under src/db/migrations are exempt.Conditions: file_path_glob: src/**/*.go (skip non-Go files).Category: functional_correctness.You can mine historical review comments, cluster them, and generate invariant candidates for human approval. Each invariant you codify is a check that will never cost a reviewer time again.I may have said that code review is a historical approval gate that no longer matches the shape of engineering work or that we can stop reading the code, but that will not happen overnight. In practice, over time, we move the human judgment upstream, where it’s more valuable. Not everything has to be reviewed to the same depth. Humans should review specs, plans, constraints, and acceptance criteria, not 500-line diffs.The other thing that sets this apart from a rules file is what happens at the time of verification. The writing agent and the verifying agent are different. They don’t share context, they don’t share blind spots, and the verifier produces a structured report per criterion — file references, reasoning, pass/fail/partial — not a gut-check opinion from the same model that wrote the code.What we built, and what it foundAt Aviator, we recently ran an experiment to test the intent-driven verification approach: what if the review happens before the code is written?Instead of AI writing code and engineers reviewing it, the team spent time writing and reviewing scope, acceptance criteria, and edge cases before any implementation started. Then we handed it to an AI agent and let it build.The result was about 6,000 lines of code. A second agent then verified the output against the 65 user criteria items in the spec. It took six minutes. 60 passed, 4 failed, and 1 was partial. “You’re not building software anymore. You’re building the machine that builds software, and quality control is part of that machine.”Human reviewers still found things, but design-level decisions were verified before any code was generated, and org invariants were enforced automatically throughout.Instead of leaving the same comment for the fifteenth time, you’re identifying the pattern, writing it once, and letting the system enforce it on every change that follows. You’re not building software anymore. You’re building the machine that builds software, and quality control is part of that machine.The post Your engineering org needs an AI slop registry appeared first on The New Stack.