We didn't design this test. The architecture produced it.
The fix was three characters: change the mock to cite patterns.md::PY-08 — the passage the retriever actually returns, and what a real LLM given that retrieval context would cite. No citation-enforcement policy was relaxed. Documented in the technical report, not papered over.