Testing & QA
scanipy's test suite is a layered set of fast, hermetic pytest modules plus a handful of cross-cutting enforcement suites that encode the project's load-bearing guarantees: deterministic output (P3), one-sided sanitizer soundness (P5), local-and-private operation (P1), and honest, graceful handling of bad inputs (P7). This document describes how the suite is organized, how to run it, and how to maintain the golden snapshots and the coverage gate.
Running the tests
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
ruff check . && ruff format --check . # lint + format
mypy src # strict types (package source)
pytest --cov=scanipy --cov-fail-under=90 # tests + coverage gate
Tests are tagged with two markers (registered in pyproject.toml):
| Marker | Meaning | Select with |
|---|---|---|
unit |
fast, isolated component tests (no filesystem walks beyond tmp_path) |
pytest -m unit |
integration |
the wired pipeline (registry → scanner → frontend → engine → reporter, and the CLI via CliRunner) over real fixtures |
pytest -m integration |
Layout
tests/
conftest.py shared fixtures (the CliRunner `runner`)
_support/ test-only helper package (SPDX-headed, stdlib + scanipy)
normalize.py version/path/fingerprint-tolerant report normalizers
corpus.py the shared scan corpus + run_corpus_scan entry point
unit/ one module per component (parser, frontend/IR, matcher,
engine transfer functions, summaries, lattice, config,
scanner, reporters, registry, CLI)
integration/
test_detectors.py the P5 catalog matrix (auto-parametrized per detector)
test_engine_end_to_end.py exact-finding end-to-end checks
test_cli.py CLI scan/rules via CliRunner
test_determinism.py P3: scan-twice byte-identical + input-order invariance
test_golden_reports.py committed, normalized JSON + SARIF snapshots
test_resilience.py SyntaxError + non-UTF-8 files → skipped, not fatal
test_performance_smoke.py ~50 synthetic files (deep chains + recursion), bounded
test_hermeticity.py no-network socket guard + no-subprocess meta-test
fixtures/ analysis DATA (intentionally-vulnerable code), lint-excluded
golden/ committed normalized snapshots (scan-corpus.json,
scan-corpus.sarif.json)
tests/_support/ is intentionally tiny and dependency-free so every cross-cutting
suite reuses one definition of "the corpus" and one set of normalizers (DRY,
determinism).
The P5 catalog matrix
tests/integration/test_detectors.py is auto-parametrized over the bundled
detector catalog: for every detector it scans the paired true-positive fixture
(tests/fixtures/python/vulnerable/<stem>.py, which it must flag) and the paired
true-negative fixture (tests/fixtures/python/safe/<stem>.py, which it must not).
Adding a detector YAML plus its TP/TN fixtures extends the matrix with zero
engine or test edits (P4). The safe fixtures route the same tainted value into
the safe form (bound parameters, shlex.quote, ast.literal_eval,
yaml.safe_load, defusedxml, …) so the test discriminates sink/sanitizer
behavior, not mere absence of taint — preserving the one-sidedness of P5 (a
missing sanitizer is noise, never a silently-suppressed real vulnerability).
Determinism (P3)
tests/integration/test_determinism.py asserts the user-visible guarantees:
- scanning the corpus twice yields byte-identical JSON, and byte-identical SARIF; and
- the order in which input files are fed to the engine does not change the final sorted finding set (the total order is real, not an artifact of discovery's internal sort).
The engine-internal determinism (repeated single-file analysis, spec-order shuffling, the witness-fingerprint tie-break, golden fingerprint values) is covered by the engine unit suites.
Golden snapshots
tests/integration/test_golden_reports.py scans the fixed corpus (the
vulnerable/ + safe/ fixture trees), renders JSON and SARIF, normalizes
the output, and compares it to the snapshots committed under tests/golden/.
Normalization (tests/_support/normalize.py) makes the snapshots stable across
machines and version bumps by rewriting three machine/version-dependent things:
| Normalized | Why |
|---|---|
the embedded tool version → "<VERSION>" |
both reporters embed scanipy.__version__, and the release bumps it |
absolute file paths → repo-relative POSIX (corpus/...) |
scans run under a per-test tmp_path |
the witness fingerprint → "<FINGERPRINT>" |
the fingerprint is a sha256 of the witness (role, file, line, col) tuples — deterministic for a given path (asserted directly in the determinism suite) but path-dependent by construction, so it cannot appear verbatim in a machine-independent golden |
Because the goldens are normalized on disk, they never contain a machine path or the version string, so a version bump or a different CI runner does not break them — while a real change to the findings still does.
Regenerating goldens
After an intentional change to the catalog, engine, or report shape:
SCANIPY_UPDATE_GOLDEN=1 pytest tests/integration/test_golden_reports.py
This rewrites tests/golden/scan-corpus.json and scan-corpus.sarif.json from
the current (normalized) output and skips the comparison for that run. Review the
diff and commit it deliberately — an unreviewed golden update defeats the purpose.
The SARIF golden is stored as
scan-corpus.sarif.json(SARIF is JSON) so the repo-wide*.sarifgitignore — which keeps transient scan output out of the tree — does not accidentally exclude this committed test artifact.
Resilience (P7)
tests/integration/test_resilience.py scans a tree mixing valid code with a
Python SyntaxError file and a non-UTF-8 / binary .py file. The scan must
complete, still scan and flag the valid files, and record the bad files as
skipped diagnostics (ScanResult.parse_errors) rather than raising. The CLI
returns OK/FINDINGS — not ERROR — for unparsable inputs; a genuinely missing
path is the one case that is a usage ERROR (exit 2).
Performance smoke
tests/integration/test_performance_smoke.py generates ~50 synthetic files that
exercise the analysis features most prone to blowup — deep attribute chains and
(mutual) recursion that drive the summary fixpoint — and asserts the full scan
finishes under a generous wall-clock budget and produces a deterministic
finding count. It is not a benchmark; its job is to catch an accidental loss of a
depth / fixpoint cap (which turns the analysis quadratic-to-exponential).
Hermeticity (P1)
tests/integration/test_hermeticity.py enforces three properties:
- No network on the scan path. A full corpus scan runs under a
socketmonkeypatch that makes any socket use raise; the scan still completes and flags the SSRF detector whose patterns name network calls. scanipy reads files from disk and analyzes in-process — it never opens a socket. - No subprocess in the suite. The CLI is driven exclusively through
click.testing.CliRunner(in-process). An AST meta-test asserts that no test module importssubprocess(fixtures are analysis DATA and are excluded). - I/O confinement. Cross-cutting scans target a copied corpus under
tmp_pathonly; they never touch the developer's working tree.
Coverage policy
The global coverage gate is 90% (line + branch over src/scanipy), enforced
in two places that must agree:
pyproject.toml→[tool.coverage.report] fail_under = 90(so local runs match CI), and.github/workflows/ci.yml→pytest … --cov-fail-under=90on every entry of the Python 3.10–3.13 matrix.
Coverage tests product behavior; source is never modified to game a metric.
When a line is uncovered, the fix is a behavior test that reaches it (or an
honest pragma: no cover for genuinely unreachable defensive code — used
sparingly and only for interpreter-version-gated branches). The correctness-
critical core — the matcher, the taint lattice, and the engine transfer
functions — is targeted at or near 100%.