← knowledge.oriz.in

Repo code-size ceiling grounded in 2026 web practice

decision repo-sizedaggerciweb-research

Repo code-size ceiling grounded 2026-07-03

Locked

Sources (12 cited)

Deep-research subagent aggregated 2026-dated sources. Full JSON at prior task output. Key claims:

Source Date Credibility Claim
Anthropic Claude Code docs 2026-06-26 high 200K default; compaction is lossy
Anthropic 1M context blog 2026-04-15 high 1M GA March 2026; "context rot" degrades before fill
GitHub Copilot docs 2026-05-13 high No hard limit; large-repo ~60s indexing
Hookflow (Cursor practice) 2026-05-21 medium 10K file cap; ~20K active-context tokens
AfterBuild Labs 2026-04-15 medium Practical threshold 10K-20K LOC
AI Rankings 2026-04-15 medium 1M ≈ 75K LOC; 15% fewer compactions post-rollout
Developers Digest 2026-05-26 medium Retrieval > raw window; 200K-line dumps miss structure
Zencoder 2026-05-18 medium 64K window + repo-graph beat larger window: 71%→84% accuracy, 5× cost cut
Karpathy autoresearch 2026-03-07 high 630-LOC repo intentionally fits single context, "readable in an afternoon"
Jin (Medium) 2026-03-25 medium Softmax attention degrades past 150K tokens; start context <20% of window
Verdent (dissent) 2026-04-01 low 1M = mid-monorepo + docs fits, big-is-fine
AIForCode (dissent) 2026-01-25 low 10K-file repo = 50M tokens; 200K window holds 3% of medium enterprise

Consensus range: 64K–500K tokens. Median: 200K.

Why two tiers not one

Single threshold rushes splits. Two tiers give leadtime:

Why 2M as hard ceiling

bookmark-mind-bs-ext decision

363K tokens — above WARN. User picked "make umbrella repo no split" — treat as intentional umbrella. Adds umbrella-repo tag to signal grandfathered. Extension + tests + docs stay one repo.

Dagger not Node

Per pipeline-stack-2026-07-01 — Dagger TS is locked. Dagger module at dagger/src/index.ts is the canonical enforcement path. Node script (scripts/audit-repo-code-tokens.mjs) kept as no-daemon fallback.

Cross-refs