type: rule
status: active
timestamp: 2026-06-28
tags: [sub-agents, token-reduction, delegation, agent-behavior, hard-rule]
status: active
timestamp: 2026-06-28
tags: [sub-agents, token-reduction, delegation, agent-behavior, hard-rule]
Delegate to sub-agents by default — researcher for reads, Haiku for batch
ACTIVE every response. Use sub-agent before reading 3+ files. Isolated context; only summary returns. Cuts tokens 40-70%
Delegate to sub-agents by default
ACTIVE EVERY RESPONSE. Token-reduction discipline at the orchestration layer.
The rule
For any of these patterns, dispatch a sub-agent instead of doing it in the main thread:
| Pattern | Sub-agent | Why |
|---|---|---|
| Reading 3+ files to answer a question | researcher | Pinned to Haiku — 5× cheaper. Returns paragraph summary. |
grep / Glob across the repo for a symbol | researcher | Same |
| ”Where is X defined?” / “What calls Y?” | researcher or Explore | Read-only, fast |
| Multi-step build (scaffold + commit + deploy) | general-purpose | Tool calls don’t bloat main context |
| Architecture planning across 5+ files | Plan | Returns plan only, not file dumps |
| Claude Code / Anthropic API Q&A | claude-code-guide | Has WebFetch + docs access |
When to skip sub-agents (stay in main thread)
- Single-file edit with a known path ? just Edit
- Trivial answer from already-loaded context ? answer directly
- Single fact lookup where path is known ? Read directly
- Tool call you already started — don’t fork mid-task
Output discipline
When delegating, the sub-agent prompt MUST specify:
- What success looks like (one-line goal)
- Return format (terse summary, paragraph not raw dump)
- Working dir (absolute path, never
/tmpon Windows Git Bash — useC:/D/oriz/.staging/<task>/) - Hard constraints (no branches, no emoji, ponytail/caveman active)
Anti-patterns
- ? Reading 5 files in the main thread to summarize them
- ? Running
grep -rthen reading every hit in main context - ? Sub-agent doing a 5-line trivial task (orchestration overhead exceeds the work)
- ? Sub-agent that returns verbose paragraphs — its prompt should mandate terse summary
- ? Forgetting to give the sub-agent a working dir, leading to
/tmpmishandling on Windows
The cost math
Reading 10 files of 200 lines each in main thread: ~20K input tokens consumed.
Delegating to researcher: ~500 tokens out (the prompt) + ~300 tokens back (the summary). ~95% savings on that operation, plus the sub-agent runs on Haiku at 1/5 the price.
Cross-refs
ponytail— code minimalism (lazy by code rung)caveman— terse proseoutput-minimalism— banned anti-patterns- This rule = lazy by orchestration (don’t do work the main thread shouldn’t carry)