W3.7 followup: harden Dupin scoping + chunk_id parsing
Some checks failed
CI / Web — typecheck + lint + build (push) Failing after 32s
CI / Scripts — Python smoke (push) Failing after 3s
CI / Web — npm audit (push) Failing after 27s
CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 3s

Two regressions surfaced in the smoke test that put Dupin from
0/3 contradictions written → 3/3 in the next run.

1. Single-doc scope was too narrow for Dupin's task.
   Holmes's question about Sandia returned 4 chunks scoped to one doc,
   but Dupin's terser "topic" form yielded only 1. Solution: Pass-1
   tries the requested doc_id; if the head is < 2 chunks, Pass-2
   widens to the whole corpus. Audit event carries `scope_widened`
   so the case-writer can later flag cross-doc contradictions
   distinctly. The unscoped retry hit 9 chunks and produced 3
   contradictions across 3 different docs.

2. Chunk-block header was ambiguous to the model.
   `--- doc-id/p007#c0042 ---` led Claude to parse `chunk_id` as
   "p007#c0042" or "p007/c0042" in the JSON output. write_contradiction
   then refused the FK lookup with "chunk not found". Fix:
   - Explicit `doc_id:` / `chunk_id:` / `page:` lines per chunk
     in the rendered block (no slashes/hashes the model can fold).
   - Defensive normalizeChunkId() in write_contradiction.ts strips
     any pNNN prefix and keeps only the trailing cNNNN — so the
     writer is forgiving without losing strictness on the topic +
     statement validation.

Smoke now produces (job 6deddf4b):
  R-0001 (3 chunks) — Color of the fireball(s) in incident summaries
  R-0002 (2 chunks) — Geographic confinement of green-fireball sightings
  R-0003 (3 chunks) — Whether the phenomenon was exclusively green or
                      also red/multicolored

R-0003 connects 3 different declassified documents: the Los Alamos
conference (exclusively-green category), a retrospective document
(red OR green), and Incident 229 (red, blue, yellow — no green).
Real cross-doc contradiction, fully cited.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Luiz Gustavo 2026-05-23 21:42:01 -03:00
parent 5ac53cb3e2
commit 25f19aee63
2 changed files with 38 additions and 5 deletions

View file

@ -35,11 +35,13 @@ export interface DupinTask {
}
function renderChunkBlock(hits: SearchHit[], lang: "pt" | "en"): string {
const blocks = hits.map((h) => {
const blocks = hits.map((h, i) => {
const text = (lang === "en" ? h.content_en : h.content_pt) || h.content_en || h.content_pt || "";
const pageStr = String(h.page).padStart(3, "0");
return [
`--- ${h.doc_id}/p${pageStr}#${h.chunk_id} ---`,
`--- chunk ${i + 1} ---`,
`doc_id: ${h.doc_id}`,
`chunk_id: ${h.chunk_id}`,
`page: ${h.page}`,
`type: ${h.type}`,
h.classification ? `classification: ${h.classification}` : null,
"",
@ -108,13 +110,33 @@ export async function runDupin(task: DupinTask): Promise<
const lang: "pt" | "en" = task.lang ?? "pt";
const k = task.context_chunks ?? 18;
const hits = await hybridSearch({
// Pass 1: scoped to the requested doc, if any.
let hits = await hybridSearch({
query: task.topic,
lang,
doc_id: task.doc_id ?? null,
top_k: k,
recall_k: 80,
});
let scope_widened = false;
// Pass 2: if a doc_id was set and the head is too thin for a tournament,
// widen the scope to the whole corpus. Cross-doc contradictions are
// valuable too (one doc says X, another says Y).
if (task.doc_id && hits.length < 2) {
const widened = await hybridSearch({
query: task.topic,
lang,
doc_id: null,
top_k: k,
recall_k: 80,
});
if (widened.length > hits.length) {
hits = widened;
scope_widened = true;
}
}
await audit({
event: "dupin_grounded",
job_id: task.job_id,
@ -122,6 +144,7 @@ export async function runDupin(task: DupinTask): Promise<
topic: task.topic,
n_chunks: hits.length,
doc_id: task.doc_id ?? null,
scope_widened,
});
if (hits.length < 2) {
return { skipped: true, reason: "insufficient_corpus" };

View file

@ -48,10 +48,20 @@ interface ResolvedPosition extends ContradictionPosition {
page: number;
}
/**
* Strip page-prefix idioms detectives sometimes emit. Canonical chunk_id is
* just `c0042`. Forms accepted: "c0042", "p007#c0042", "p007/c0042".
*/
function normalizeChunkId(raw: string): string {
const m = raw.match(/c\d{4,}$/);
return m ? m[0] : raw;
}
async function resolveChunk(doc_id: string, chunk_id: string): Promise<{ chunk_pk: number; page: number } | null> {
const cid = normalizeChunkId(chunk_id);
const row = await queryOne<{ chunk_pk: number; page: number }>(
`SELECT chunk_pk, page FROM public.chunks WHERE doc_id = $1 AND chunk_id = $2`,
[doc_id, chunk_id],
[doc_id, cid],
);
return row ?? null;
}