Назад към всички

test-review

// Use this skill for test suite evaluation and quality assessment. Use when auditing test suites, analyzing coverage gaps, improving test quality, evaluating TDD/BDD compliance. Do not use when writing new tests - use parseltongue:python-testing. DO NOT use when: updating existing tests - use sanctum:

$ git log --oneline --stat
stars:201
forks:38
updated:March 4, 2026
SKILL.mdreadonly
SKILL.md Frontmatter
nametest-review
descriptionEvaluate test suites for coverage gaps and TDD/BDD compliance.
alwaysApplyfalse
categorytesting
tagstesting,tdd,bdd,coverage,quality,fixtures
tools
usage_patternstest-audit,coverage-analysis,quality-improvement,gap-remediation
complexityintermediate
model_hintstandard
estimated_tokens200
progressive_loadingtrue
dependenciespensive:shared,imbue:proof-of-work
modulesmodules/framework-detection.md,modules/coverage-analysis.md,modules/scenario-quality.md,modules/remediation-planning.md,modules/content-assertion-quality.md

Table of Contents

Test Review Workflow

Evaluate and improve test suites with TDD/BDD rigor.

Quick Start

/test-review

Verification: Run pytest -v to verify tests pass.

When To Use

  • Reviewing test suite quality
  • Analyzing coverage gaps
  • Before major releases
  • After test failures
  • Planning test improvements

When NOT To Use

  • Writing new tests - use parseltongue:python-testing
  • Updating existing tests - use sanctum:test-updates

Required TodoWrite Items

  1. test-review:languages-detected
  2. test-review:coverage-inventoried
  3. test-review:scenario-quality
  4. test-review:invariant-preservation
  5. test-review:gap-remediation
  6. test-review:evidence-logged

Progressive Loading

Load modules as needed based on review depth:

  • Basic review: Core workflow (this file)
  • Framework detection: Load modules/framework-detection.md
  • Coverage analysis: Load modules/coverage-analysis.md
  • Quality assessment: Load modules/scenario-quality.md
  • Remediation planning: Load modules/remediation-planning.md

Workflow

Step 1: Detect Languages (test-review:languages-detected)

Identify testing frameworks and version constraints. → See: modules/framework-detection.md

Quick check:

find . -maxdepth 2 -name "Cargo.toml" -o -name "pyproject.toml" -o -name "package.json" -o -name "go.mod"

Verification: Run the command with --help flag to verify availability.

Step 2: Inventory Coverage (test-review:coverage-inventoried)

Run coverage tools and identify gaps. → See: modules/coverage-analysis.md

Quick check:

git diff --name-only | rg 'tests|spec|feature'

Verification: Run pytest -v to verify tests pass.

Step 3: Assess Scenario Quality (test-review:scenario-quality)

Evaluate test quality using BDD patterns and assertion checks. → See: modules/scenario-quality.md

Focus on:

  • Given/When/Then clarity
  • Assertion specificity
  • Anti-patterns (dead waits, mocking internals, repeated boilerplate)

Step 4: Plan Remediation (test-review:gap-remediation)

Create concrete improvement plan with owners and dates. → See: modules/remediation-planning.md

Step 5: Log Evidence (test-review:evidence-logged)

Record executed commands, outputs, and recommendations. → See: imbue:proof-of-work

Test Quality Checklist (Condensed)

  • Clear test structure (Arrange-Act-Assert)
  • Critical paths covered (auth, validation, errors)
  • Specific assertions with context
  • No flaky tests (dead waits, order dependencies)
  • Reusable fixtures/factories
  • Invariant-encoding tests intact (see below)

Invariant-Encoding Tests

Tests do not just verify behavior — they encode design invariants. A test that asserts "module A never imports from module B" encodes a layer boundary. A test that asserts "this function is pure" encodes a concurrency model. These tests are load-bearing in ways that coverage metrics cannot capture.

During review, check:

  1. Were invariant-encoding tests removed or weakened? A test that enforced an architectural boundary, data structure constraint, or API contract should not be deleted without naming the invariant being abandoned and escalating to human judgment.

  2. Were test expectations changed to match a broken implementation? If an assertion value changed, ask: did the requirement change, or did the agent change the test to make its code pass? The latter is the single most dangerous form of test tampering.

  3. Are new invariants encoded as tests? When a design decision is made (choice of data structure, module boundary, error strategy), there should be at least one test whose failure would signal that the invariant was violated.

Red flag patterns:

PatternRisk
@pytest.mark.skip added to a passing testInvariant being silently dropped
Assertion changed from specific to broadConstraint being relaxed
Test renamed to describe new behaviorOld invariant erased from history
Test deleted "because it tested old code"Invariant removed without replacement

When invariant erosion is detected:

Do NOT approve. Flag as a BLOCKING quality issue and present the three options to the human:

  1. Preserve: Revert the test change, fix the implementation to satisfy the invariant
  2. Layer: Keep the invariant test, add the new behavior alongside it (accepting inelegance)
  3. Revise: The invariant is genuinely wrong — remove the old test AND write a new test encoding the replacement invariant

This is a judgment call that models get wrong far too often. Default to option 1 (preserve) when no human is available.

Output Format

## Summary
[Brief assessment]

## Framework Detection
- Languages: [list] | Frameworks: [list] | Versions: [constraints]

## Coverage Analysis
- Overall: X% | Critical: X% | Gaps: [list]

## Quality Issues
[Q1] [Issue] - Location - Fix

## Remediation Plan
1. [Action] - Owner - Date

## Recommendation
Approve / Approve with actions / Block

Verification: Run the command with --help flag to verify availability.

Integration Notes

  • Use imbue:proof-of-work for reproducible evidence capture
  • Reference imbue:diff-analysis for risk assessment
  • Format output using imbue:structured-output patterns

Exit Criteria

  • Frameworks detected and documented
  • Coverage analyzed and gaps identified
  • Scenario quality assessed
  • Remediation plan created with owners and dates
  • Evidence logged with citations

Troubleshooting

Common Issues

Tests not discovered Ensure test files match pattern test_*.py or *_test.py. Run pytest --collect-only to verify.

Import errors Check that the module being tested is in PYTHONPATH or install with pip install -e .

Async tests failing Install pytest-asyncio and decorate test functions with @pytest.mark.asyncio