Link copied
BlogDescription Disambiguation Is the Highest-Impact Router Intervention I Have Tried
AI Workflow

Description Disambiguation Is the Highest-Impact Router Intervention I Have Tried

KG
Teh Kim GuanACMA · CGMA
2026-05-27 · 7 min read
Description Disambiguation Is the Highest-Impact Router Intervention I Have Tried

Three description rewrites. Twenty-five minutes. Fourteen points of router top-1 accuracy. I have not found another intervention with that effort-to-payoff ratio.

I run a 103-skill corpus, all prefixed kg-, on top of Claude Code. I measure router performance on a 50-case adversarial benchmark: each case has a prompt, an expected skill, a list of acceptable secondaries, and a list of explicitly rejected skills. When I closed the second phase of my context-engineering build last week, top-1 accuracy sat at 94 percent. Earlier in the same session, it was at 80 percent. The 14-point move came from three description rewrites, each of which added a single NOT-clause to a skill that was over-claiming adjacent scope.

I had spent more time in the weeks before that session tightening invocation triggers, adjusting frontmatter, and debating whether to swap the router model. None of that produced 14 points. The descriptions did.


What the public discussion misses

Most AI tooling content frames router accuracy as a model problem. Use a bigger model. Use a longer context window. Give the router more examples. All of that is downstream of the actual issue.

Descriptions written without explicit NOT-clauses produce overlapping claim surfaces. If skill A says "I audit and assess skills" and skill B says "I maintain the skill index," both descriptions are true but neither draws a hard boundary. The router, faced with a prompt that contains both "skills" and an action verb, has to choose between two equally plausible surfaces. It will guess based on token proximity and phrase frequency, not intent. High-confidence guesses are especially dangerous here: the router does not hesitate, so you do not get a signal that disambiguation is needed.

The fix is not to make the descriptions longer or to add more trigger phrases. The fix is to make each description explicitly disclaim the scope of its nearest neighbour. A skill that says "NOT for X, that is Y's job" gives the router a hard wall rather than a soft gradient. The router stops treating the two skills as interchangeable.

The second thing the public discussion misses is the relationship between corpus size and collision frequency. At 10 skills, adjacency collisions are rare. At 50, they are manageable. At 103, they are the dominant source of routing error. The number of potential collision pairs grows faster than the corpus itself. Description quality compounds in importance as the corpus grows, which means the investment in NOT-clauses pays more the later you make it. That is also an argument for making it early.

This is a familiar problem in accounting: a profit and loss account does not get to claim cash receipts that belong on the balance sheet. The chart of accounts is valuable precisely because each account has hard boundaries, not fuzzy ones.


The walkthrough

Before: a confident wrong answer

The case that sharpened this most clearly was adversarial-ce-vs-skills-registry.

Prompt: "Update the skills registry with the new kg-ce entry."

Expected route: kg-skills-registry. That skill is responsible for rebuilding the index from disk, adding skill metadata, and maintaining the running count of what is in the corpus.

Actual route before the rewrite: kg-ce at 0.88 confidence score.

That 0.88 is the detail worth pausing on. A low-confidence wrong answer is a hesitation signal. A high-confidence wrong answer means the router is fully committed to the miss and will not back down without new information. It is the most dangerous category of routing failure because it is silent. The system proceeds. The wrong skill runs. Nothing in the output flags the problem.

Why did kg-ce claim this route? Its description said it "audits and assesses skills." The word "skills" appears in the prompt. The word "entry" reads as adjacent to "audit." The router found a plausible surface and committed to it at high confidence.

The rewrite

The fix was one addition to kg-ce's description:

NOT for REGISTRY MAINTENANCE (rebuilding the index from disk, schema migration, adding skill metadata, performance metrics): that is kg-skills-registry's job.

The clause names the activity being disclaimed. It names the correct skill explicitly. It does not hedge.

After: two problems solved at once

After the rewrite, the adversarial-ce-vs-skills-registry case routed to kg-skills-registry as expected.

But there was a second benefit. The same rewrite session restored kg-ce's own trigger phrases for "ce snapshot" prompts. Earlier in the same session, I had over-disclaimed in a draft of the NOT-clause, and "ce snapshot" prompts were being deflected along with the registry prompts. The final rewrite was precise enough that kg-ce now routes its own snapshot trigger correctly at 0.92 confidence, while also refusing the registry route it was wrongly claiming before.

That is what a well-scoped NOT-clause does. It closes one wrong door without closing any right doors. The precision matters. I will come back to what happens when you miss it.


The three-step recipe

This is the pattern anyone running a multi-skill corpus can apply tonight.

Step 1. List the skills your router most often confuses. Do not use hit-or-miss statistics alone. Pull the confidence scores on misses. Sort descending. The cases where the router was most confident and most wrong are the ones that need NOT-clauses first. High-confidence misses mean the claim surfaces are genuinely overlapping, not that the router was uncertain.

Step 2. For each confusion pair, identify the adjacency that is triggering it. It is usually one of three things: an overlapping verb (both skills "assess" or both "manage"), an overlapping object (both skills reference "skills" or "content" or "research"), or an overlapping audience (both skills fire for the same user intent even though their actual operations differ). Name the adjacency precisely before writing anything.

Step 3. Write a NOT-clause into the description of the lower-precedence skill in the pair. The clause should disclaim the specific adjacency, not the broad category. Name the correct skill explicitly. The format that works: "NOT for [specific activity], that is [correct-skill]'s job."

Three rewrites in the session that produced the 14-point lift also covered kg-ara (strategic-tier vs research-recommendation-tier), kg-larksync (disclaim general URL fetching, which belongs to kg-fetchcontent), and kg-coworkpm (disclaim the weekly improvement-cycle overlap with kg-autocowork). Each rewrite contributed roughly three percentage points. The gains were additive, not diminishing, because each pair was a distinct adjacency rather than the same overlap under different names.


The trap: over-disclaiming

The first draft of the kg-ce NOT-clause went too far. I wrote something close to: "NOT for listing what skills exist or maintaining the inventory."

That disclaimed the registry overlap correctly. But it also disclaimed "ce snapshot" prompts, because a snapshot produces a list of what skills exist. The router read the disclaimer literally and routed snapshot prompts away from kg-ce.

I lost the accuracy I had just gained, on a different case.

The lesson: the goal is to disclaim adjacent scope, not to shrink your own scope. "I do not do X, that is Y's job" is correct. "I do not do anything that looks like X" is too broad and will catch legitimate cases from your own trigger surface.

Write the NOT-clause around the specific activity being confused. Test it against your own expected-route cases before treating it as done. If a previously passing case starts failing after a rewrite, the clause is over-scoped.

Specific is the goal. Minimal is the execution principle. One sentence per adjacency. Name the adjacent skill explicitly so the router has a positive alternative, not just a rejection.


What this means for skill design from day one

The description disambiguation work I described was Phase 2 retrofit. I was cleaning up a corpus that had grown organically without explicit NOT-clauses.

The cleaner pattern is to write the NOT-clause into the description at authoring time. When you create a new skill, ask: which existing skill is most likely to claim the same routes? Write the NOT-clause for that pair before the new skill ships. The cost is one sentence per adjacency, paid once. The savings compound across every benchmark run and every live session where the router has to choose.

Treat skill descriptions as boundary documents, not marketing copy. A description that says "I do everything related to X" is a claim surface waiting to collide. A description that says "I do A and B but NOT C (that is D's job)" is a routing instruction. The router executes routing instructions. It approximates when all it has are claim surfaces.

One practical check at authoring time: read your new skill's description aloud and ask which existing skill could plausibly claim the same prompt. If you can name one, add the NOT-clause before the skill ships. The cost is thirty seconds and one sentence. The alternative is discovering the collision at benchmark time, after the corpus has grown large enough that the collision is invisible until you go looking for it.

The 14 points did not come from a better model or a longer prompt. They came from three sentences.


Part of the Product Pipeline series from KG Consultancy.

About the Author
KG
Teh Kim Guan
Product Consultant · General Manager, PEPS Ventures

Strategy and technology are the same decision. Over 15 years in fintech (CTOS, D&B), prop-tech (PropertyGuru DataSense), and digital startups, I have built frameworks that help founders and executives make both moves at once. Based in Kuala Lumpur.

More from the blog
AI Workflow
The Two-Tier Staff Agent Pattern: Strategic Ara, Operational Deeno
A single AI assistant tries to be two cognitive registers in one inbox. Splitting it into a strategic tier and an operational tier with named personalities and distinct cadences makes register-switching routable rather than negotiable.
2026-05-27 · 9 min read
AI Workflow
I Measured My AI Router This Week. It Hit 94 Percent Top-1 Versus a 48.9 Percent Baseline.
I ran a 50-case adversarial benchmark against my 103-skill Claude Code corpus. Top-1 hit 94 percent. The reference baseline is 48.9 percent. Here is what closed the 45-point gap.
2026-05-27 · 8 min read
AI Workflow
Telemetry-First Sub-Agent Dispatch Saved Me 18 Minutes on a Quota Cliff
Write a JSONL event before each sub-agent fires, and another after it completes. Six lines of Python turn a quota interrupt into a targeted retry of only what failed.
2026-05-27 · 6 min read
Work with KG

Working on a 0→1 product?

I help founders and operators go from idea to validated product. Let's talk about yours.

Get in touch →