Monolithic Test File¶
| Slug | Severity | Detection Scope | Protects |
|---|---|---|---|
monolithic-test-file |
Medium | per-file | Understandable, Maintainable |
Summary¶
A single test file mixes multiple behavior domains — parser tests next to integration tests next to regression cases next to contract tests — usually because every new feature accreted into the file of least resistance. Hard to navigate, hard to diff, hard to assign ownership.
Aliases¶
- "monolithic"
- "god test file"
- "test file too large"
- "mixed domains in one file"
- "test file needs splitting"
Description¶
The suite-level analogue of a single test that does too much — a file, not a test, that has outgrown its subject. Often co-occurs with and amplifies semantic-redundancy (authors don't see what's already there), wrong-level (levels mix), and deliverable-fossils (each checklist item got its own describe block in the same file).
The semantic judgment: cluster the file's tests by behavior domain (describe-before-edit plus embedding clusters), and decide which clusters deserve their own file. This requires naming the behavior domains in a way that matches the product, not the implementation — the same capability that powers deliverable-fossils.
Signals¶
- Test file > 1000 lines or > 50
it/testblocks. - File imports from > N unrelated modules (> 5 is a rough threshold).
- Multiple top-level
describeblocks naming clearly different subjects. - Mix of mocks plus real clients plus subprocess calls within one file.
//,#, or block-comment section headers dividing the file (// ===== SYNC =====,# --- FIXTURES ---) — an author's tell.- High duplication score within the file (jscpd,
flay).
False-positive guards¶
File-shape signals over-trigger when applied as if size alone were the violation:
- Line/test-count thresholds are signals, not verdicts. A 1500-line file of small parameterized cases all targeting one parser is not monolithic — it has a single behavior domain. The smell fires on mixed domains, not size. Apply the size signals (>1000 lines, >50
itblocks) as a triage filter for human review, not as a pass/fail verdict; the semantic judgment is "are these tests about the same product capability?", not "is this file too big?". - Co-location-by-convention ecosystems. Go's
package_test.goadjacent topackage.goand Rust's#[cfg(test)] mod testsplace all coverage for a unit in one file by deliberate convention; splitting fights the build system's discovery and the ecosystem's idioms. Flag only when in-file content mixes behaviors the ecosystem would not idiomatically colocate (e.g. a Go file mixing pure-function units with subprocess-driving integration tests deserves an_integration_test.gosplit — seewrong-level— but a file of focused unit tests for one package's exported functions is not a violation regardless of size).
Prescribed Fix¶
- Describe-before-edit over every test in the file.
- Cluster by behavior domain; propose a per-domain target file with a behavior-shaped name.
- Emit a split plan:
original.test.ts→{ a.test.ts, b.test.ts, c.test.ts }, with each destination's test list and the rationale for the grouping. - Execute the split via codemod; imports and shared helpers follow.
- If shared setup warrants it, extract a small
test-support/module rather than duplicating. - Gate: preservation of regression-detection power plus same total test count plus CI green on each new file.
This move pairs with deliverable-fossils: run the rename pass first so clusters form around product capabilities rather than checklist items.
Example¶
Before¶
tests/test_sync.py (1367 lines)
├── class TestSessionParsing (12 tests)
├── class TestMessageExtraction (18 tests)
├── class TestTrackingDB (9 tests)
├── class TestEmbedding (6 tests)
└── class TestFullSyncEnd2End (4 tests)
After¶
tests/
test_session_parsing.py (session-JSONL parsing only)
test_message_extraction.py (message extraction + token counting)
test_tracking_db.py (tracking DB schema + migrations)
test_embedding.py (embedding integration)
test_sync_e2e.py (full sync end-to-end)
_support/fixtures.py (extracted shared fixtures)
One file was mixing five behavior domains. Each is now a behavior-shaped file; shared fixtures moved into _support/ rather than duplicated. The full-sync e2e is isolated so CI can run it in a separate slow tier.
Related modes¶
semantic-redundancy— monolithic files hide redundancy; run dedup after the split so clusters are per-domain.wrong-level— the e2e extraction in the example is a wrong-level move; often they compose.deliverable-fossils— run the rename first so splitting uses product vocabulary.
Polyglot notes¶
The split is universal; the codemod is ecosystem-specific (LibCST, jscodeshift, OpenRewrite, ast-grep). Preserving imports is the only real complication, and every codemod ecosystem has affordances for it.