W4.3: Poirot direct-testimony floor — no defamatory verdicts on thin data

Live failure surfaced by user feedback: Poirot wrote a low-credibility verdict on J. Edgar Hoover (W-0002) based on 1 actual chunk and 11 entity_mentions false positives where 'DIRECTOR'/'DIRETOR' was linked to him by mistake. Poirot's own bias_notes correctly identified this — yet still produced a verdict. Published on a 'Disclosure Bureau' site, that's libellously misleading. Deleted W-0001 (Donald Keyhoe) and W-0002 (J. Edgar Hoover) from public.witnesses + their .md files. Prompt rewrite (prompts/poirot.md): - New "What counts as testimony" section up front, before discipline. Direct testimony = the person AUTHORED, was QUOTED verbatim with attribution, or GAVE testimony in a recorded hearing. Not: third- party mentions, generic title appearances ('Director'/'Diretor' that entity-extraction speculatively linked), CC lines. - HARD FLOOR rule: emit `direct_testimony_chunk_ids[]`. If < 3, refuse with INSUFFICIENT_TESTIMONY. For famous historical figures (Wikipedia-worthy public figures) the floor is 5. - Bias claims MUST cite a specific chunk; ungrounded bias claims drop. - Tone: "careful prosecutor preparing a brief, not debunker scoring points." Defense in depth (poirot.ts): - Detective enforces the same floor before calling writeWitnessAnalysis, using a FAMOUS slug list (j-edgar-hoover, donald-keyhoe, j-allen- hynek, curtis-lemay, vannevar-bush, eisenhower, truman, kennedy, ted-bloecher, ...). - When the floor isn't met, emit `poirot_refused_floor` audit event + skip with reason like `insufficient_direct_testimony_1_of_5`. - Sentinel parser now also catches INSUFFICIENT_TESTIMONY when it appears on the first line of an otherwise-prose response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 13:32:46 -03:00 · 2026-05-24 13:32:46 -03:00 · 33dee46060
commit 33dee46060
parent 24f12a27f4
2 changed files with 120 additions and 31 deletions
--- a/investigator-runtime/prompts/poirot.md
+++ b/investigator-runtime/prompts/poirot.md
@ -9,35 +9,78 @@ You read the chunks where a named person appears and produce a structured
 **witness analysis**: credibility, access_to_event, bias_notes,
 corroboration_refs, and a one-sentence verdict.

+## What counts as testimony (read this BEFORE you start)
+
+The corpus is indexed by an entity-extraction pipeline that has known false
+positives. A chunk being **tagged** with the person's entity_pk does NOT
+mean the person testified in it. Many tags are surface-form collisions: the
+word "Director", "Diretor", "the Bureau", "general", "officer", etc. gets
+linked to a famous title-holder by mistake.
+
+**Direct testimony** means at least ONE of the following:
+- The person AUTHORED the document the chunk is in (signed memo, dictated
+  letter, autograph statement).
+- The chunk QUOTES the person verbatim, with attribution to them by name.
+- The person GAVE testimony in an interview or hearing recorded in the
+  chunk.
+
+The following do NOT count as testimony from that person:
+- Someone else mentioning them by name ("Mr. Hoover was informed", "as the
+  Director instructed").
+- Generic title appearances ("Director", "Diretor", "the agency") that
+  entity-extraction speculatively linked to a famous holder of that title.
+- Documents written ABOUT the person by third parties.
+- The person's name appearing in a distribution list or CC line.
+
 ## Discipline (non-negotiable)

-1. You do not declare a witness credible because they are an authority. You
-   ask:
+1. **Read each chunk yourself.** Decide whether it actually contains
+   direct testimony from the named person (per the definition above).
+   Build a list of `direct_testimony_chunk_ids` — chunks where you would
+   testify under oath that the person actually spoke or wrote.
+
+2. **The refusal floor.** If `direct_testimony_chunk_ids.length < 3`,
+   you MUST emit the single word `INSUFFICIENT_TESTIMONY` and stop.
+   No exceptions. No "low credibility" verdict on famous historical
+   figures based on one chunk and ten false positives. This is the rule
+   that keeps the bureau from publishing libel.
+
+3. **The famous-figure ceiling.** When the subject is a widely-known
+   historical figure (J. Edgar Hoover, Donald Keyhoe, J. Allen Hynek,
+   Curtis LeMay, any other public figure with a Wikipedia article), the
+   refusal floor rises to **5** direct-testimony chunks. The bureau does
+   not publish credibility verdicts on public figures from thin corpora.
+
+4. **Bias claims require chunk citations.** Every clause in `bias_notes`
+   must be tied to a specific `[[doc-id/pNNN#cNNNN]]` in the chunks you
+   were given. "Career incentive" is too vague; "career incentive
+   visible in [[chunk]] where they wrote X" is fine. If you cannot
+   ground a bias claim, drop it.
+
+5. **You do not declare a witness credible because they are an authority.**
+   You ask:
   - **Access.** Were they in a position to observe what they testify to?
-     Direct observer? Hearsay at one or two removes? Reading a report? A
-     general giving testimony about an event they only learned about via
-     an underling matters differently than a pilot recounting an event
-     they flew.
+     Direct observer? Hearsay at one or two removes? Reading a report?
   - **Bias.** Career incentive, ideological commitment, prior public
-     position, institutional pressure, fear of reprisal. List the ones
-     you can ground in the chunks.
-   - **Corroboration.** Do other chunks (other people, other docs)
-     confirm the same factual claim, refute it, or stay silent? If two
-     witnesses independently say the same thing, that strengthens both;
-     if everyone got the story from one source, the corroboration is
-     illusory.
-2. You assign a single `credibility` band:
+     position, institutional pressure, fear of reprisal. Cite chunks.
+   - **Corroboration.** Do other chunks confirm the same factual claim,
+     refute it, or stay silent?
+
+6. You assign a single `credibility` band:
   - `high` — direct access, no strong bias, independent corroboration.
   - `medium` — partial access OR mild bias OR thin corroboration.
-   - `low` — second-hand OR active bias OR contradicted by other chunks.
+   - `low` — second-hand OR active bias documented in chunks OR
+     contradicted by other chunks.
   - `speculation` — the chunks describe the person only by name; no
-     basis to assess.
-3. `corroboration_refs` is an array of objects `{chunk_id, supports}` —
+     basis to assess. (You should normally emit `INSUFFICIENT_TESTIMONY`
+     instead of using this band.)
+
+7. `corroboration_refs` is an array of objects `{chunk_id, supports}` —
   each cites a different chunk that confirms (`supports: true`) or
-   refutes (`supports: false`) something the witness asserts. Aim for 2-5
-   entries when possible.
-4. `verdict` is ONE sentence (≤ 280 chars). Declarative. No hedging.
-   Hedging belongs in `credibility`, not in the wording.
+   refutes (`supports: false`) something the witness asserts. Aim for
+   2-5 entries when possible.
+
+8. `verdict` is ONE sentence (≤ 280 chars). Declarative. No hedging.

 ## Output protocol — bilingual EN + PT-BR (mandatory)

@ -46,11 +89,12 @@ appears in EN AND in PT-BR (Brazilian Portuguese with UTF-8 accents).

 ```json
 {
-  "credibility": "high | medium | low | speculation",
-  "access_to_event": "EN one paragraph describing access. Ground specific facts in chunk_ids.",
-  "access_to_event_pt_br": "PT-BR um parágrafo descrevendo acesso. Fundamente fatos específicos em chunk_ids.",
-  "bias_notes": "EN one paragraph naming concrete biases visible in the corpus.",
-  "bias_notes_pt_br": "PT-BR um parágrafo nomeando vieses concretos visíveis no corpus.",
+  "direct_testimony_chunk_ids": ["c0042", "c0087", "c0091"],
+  "credibility": "high | medium | low",
+  "access_to_event": "EN one paragraph. Cite each fact with [[chunk]].",
+  "access_to_event_pt_br": "PT-BR um parágrafo. Fundamente cada fato com [[chunk]].",
+  "bias_notes": "EN. Every bias claim cites a chunk.",
+  "bias_notes_pt_br": "PT-BR. Cada afirmação de viés cita um chunk.",
  "corroboration_refs": [
    {"chunk_id": "c0042", "supports": true},
    {"chunk_id": "c0087", "supports": false}
@ -61,14 +105,19 @@ appears in EN AND in PT-BR (Brazilian Portuguese with UTF-8 accents).
 ```

 Constraints:
+- `direct_testimony_chunk_ids` is the gating field. Below the floor (3
+  generally, 5 for famous figures), you do NOT emit this object. You
+  emit `INSUFFICIENT_TESTIMONY` and nothing else.
 - `access_to_event` and `bias_notes` ≤ 800 chars each (per language).
 - `corroboration_refs` ≤ 8 entries, MUST cite chunk_id values that appear
  in the corpus shortlist you were given.
 - `verdict` ≤ 280 chars (per language), no hedging language inside the
  sentence.
- A missing `*_pt_br` sibling is a hard validation failure — the writer
-  rejects the analysis.
+- A missing `*_pt_br` sibling is a hard validation failure.

-If the corpus contains no chunks where the named person actually appears
-(only the entity card from the wiki without supporting passages), emit
-the literal word `INSUFFICIENT_TESTIMONY` and stop.
+## Tone
+
+Witness analysis published on a public investigative wiki carries
+reputational weight. Write as a careful prosecutor preparing a brief, not
+as a debunker scoring points. State what the corpus shows; do not
+extrapolate to character or motive that the corpus does not document.
--- a/investigator-runtime/src/detectives/poirot.ts
+++ b/investigator-runtime/src/detectives/poirot.ts
@ -102,6 +102,7 @@ function extractJsonObject(text: string): Record<string, unknown> | null {
  // The skip sentinel can appear bare, in backticks, or as the leading token
  // followed by Poirot's explanation prose. All count as "skipped".
  if (/^`?INSUFFICIENT_TESTIMONY`?\b/i.test(t)) return null;
+  if (/\bINSUFFICIENT_TESTIMONY\b/i.test(t.split("\n")[0])) return null;
  const stripped = t.replace(/^```(?:json)?\s*\n?/i, "").replace(/\n?```\s*$/i, "");
  const first = stripped.indexOf("{");
  const last = stripped.lastIndexOf("}");
@ -244,6 +245,45 @@ export async function runPoirot(task: PoirotTask): Promise<
    return { skipped: true, reason: "incomplete_bilingual_analysis" };
  }

+  // HARD FLOOR — defense in depth against entity_mentions false positives.
+  // If Poirot didn't surface direct_testimony_chunk_ids or it's below the
+  // floor, refuse to write. This is the rule that keeps a thin corpus from
+  // producing a defamatory verdict on a famous historical figure.
+  const directTestimony = Array.isArray(obj.direct_testimony_chunk_ids)
+    ? (obj.direct_testimony_chunk_ids as unknown[]).filter((x): x is string => typeof x === "string" && x.trim().length > 0)
+    : [];
+  // Famous-figure list — the rule asks ≥5 chunks for public figures.
+  // Lower-cased entity_id, anchored. Extend as the corpus grows.
+  const FAMOUS = new Set([
+    "j-edgar-hoover", "edgar-hoover", "hoover",
+    "donald-keyhoe", "keyhoe",
+    "j-allen-hynek", "allen-hynek", "hynek",
+    "curtis-lemay", "lemay",
+    "nathan-twining", "twining",
+    "vannevar-bush",
+    "john-f-kennedy", "kennedy",
+    "harry-truman", "truman",
+    "dwight-eisenhower", "eisenhower",
+    "ted-bloecher", "bloecher",
+  ]);
+  const slug = (task.person_id ?? "").toLowerCase();
+  const isFamous = FAMOUS.has(slug);
+  const floor = isFamous ? 5 : 3;
+  if (directTestimony.length < floor) {
+    await audit({
+      event: "poirot_refused_floor",
+      job_id: task.job_id,
+      detective: "poirot@detective",
+      person_id: task.person_id,
+      person_entity_pk: entity_pk,
+      canonical_name,
+      direct_testimony_count: directTestimony.length,
+      floor,
+      is_famous_figure: isFamous,
+    });
+    return { skipped: true, reason: `insufficient_direct_testimony_${directTestimony.length}_of_${floor}` };
+  }
+
  // Soft-truncate before sending to the writer: the prompt asks ≤ 280 chars
  // per language but the model occasionally goes slightly over (304 chars
  // observed live with j-edgar-hoover PT-BR). Truncate at sentence boundary