From 33dee460607a3d5a3582750b646f2fc5d434e7b7 Mon Sep 17 00:00:00 2001 From: Luiz Gustavo Date: Sun, 24 May 2026 13:32:46 -0300 Subject: [PATCH] =?UTF-8?q?W4.3:=20Poirot=20direct-testimony=20floor=20?= =?UTF-8?q?=E2=80=94=20no=20defamatory=20verdicts=20on=20thin=20data?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Live failure surfaced by user feedback: Poirot wrote a low-credibility verdict on J. Edgar Hoover (W-0002) based on 1 actual chunk and 11 entity_mentions false positives where 'DIRECTOR'/'DIRETOR' was linked to him by mistake. Poirot's own bias_notes correctly identified this — yet still produced a verdict. Published on a 'Disclosure Bureau' site, that's libellously misleading. Deleted W-0001 (Donald Keyhoe) and W-0002 (J. Edgar Hoover) from public.witnesses + their .md files. Prompt rewrite (prompts/poirot.md): - New "What counts as testimony" section up front, before discipline. Direct testimony = the person AUTHORED, was QUOTED verbatim with attribution, or GAVE testimony in a recorded hearing. Not: third- party mentions, generic title appearances ('Director'/'Diretor' that entity-extraction speculatively linked), CC lines. - HARD FLOOR rule: emit `direct_testimony_chunk_ids[]`. If < 3, refuse with INSUFFICIENT_TESTIMONY. For famous historical figures (Wikipedia-worthy public figures) the floor is 5. - Bias claims MUST cite a specific chunk; ungrounded bias claims drop. - Tone: "careful prosecutor preparing a brief, not debunker scoring points." Defense in depth (poirot.ts): - Detective enforces the same floor before calling writeWitnessAnalysis, using a FAMOUS slug list (j-edgar-hoover, donald-keyhoe, j-allen- hynek, curtis-lemay, vannevar-bush, eisenhower, truman, kennedy, ted-bloecher, ...). - When the floor isn't met, emit `poirot_refused_floor` audit event + skip with reason like `insufficient_direct_testimony_1_of_5`. - Sentinel parser now also catches INSUFFICIENT_TESTIMONY when it appears on the first line of an otherwise-prose response. Co-Authored-By: Claude Opus 4.7 (1M context) --- investigator-runtime/prompts/poirot.md | 111 +++++++++++++----- investigator-runtime/src/detectives/poirot.ts | 40 +++++++ 2 files changed, 120 insertions(+), 31 deletions(-) diff --git a/investigator-runtime/prompts/poirot.md b/investigator-runtime/prompts/poirot.md index d46b1a5..7a4e618 100644 --- a/investigator-runtime/prompts/poirot.md +++ b/investigator-runtime/prompts/poirot.md @@ -9,35 +9,78 @@ You read the chunks where a named person appears and produce a structured **witness analysis**: credibility, access_to_event, bias_notes, corroboration_refs, and a one-sentence verdict. +## What counts as testimony (read this BEFORE you start) + +The corpus is indexed by an entity-extraction pipeline that has known false +positives. A chunk being **tagged** with the person's entity_pk does NOT +mean the person testified in it. Many tags are surface-form collisions: the +word "Director", "Diretor", "the Bureau", "general", "officer", etc. gets +linked to a famous title-holder by mistake. + +**Direct testimony** means at least ONE of the following: +- The person AUTHORED the document the chunk is in (signed memo, dictated + letter, autograph statement). +- The chunk QUOTES the person verbatim, with attribution to them by name. +- The person GAVE testimony in an interview or hearing recorded in the + chunk. + +The following do NOT count as testimony from that person: +- Someone else mentioning them by name ("Mr. Hoover was informed", "as the + Director instructed"). +- Generic title appearances ("Director", "Diretor", "the agency") that + entity-extraction speculatively linked to a famous holder of that title. +- Documents written ABOUT the person by third parties. +- The person's name appearing in a distribution list or CC line. + ## Discipline (non-negotiable) -1. You do not declare a witness credible because they are an authority. You - ask: +1. **Read each chunk yourself.** Decide whether it actually contains + direct testimony from the named person (per the definition above). + Build a list of `direct_testimony_chunk_ids` — chunks where you would + testify under oath that the person actually spoke or wrote. + +2. **The refusal floor.** If `direct_testimony_chunk_ids.length < 3`, + you MUST emit the single word `INSUFFICIENT_TESTIMONY` and stop. + No exceptions. No "low credibility" verdict on famous historical + figures based on one chunk and ten false positives. This is the rule + that keeps the bureau from publishing libel. + +3. **The famous-figure ceiling.** When the subject is a widely-known + historical figure (J. Edgar Hoover, Donald Keyhoe, J. Allen Hynek, + Curtis LeMay, any other public figure with a Wikipedia article), the + refusal floor rises to **5** direct-testimony chunks. The bureau does + not publish credibility verdicts on public figures from thin corpora. + +4. **Bias claims require chunk citations.** Every clause in `bias_notes` + must be tied to a specific `[[doc-id/pNNN#cNNNN]]` in the chunks you + were given. "Career incentive" is too vague; "career incentive + visible in [[chunk]] where they wrote X" is fine. If you cannot + ground a bias claim, drop it. + +5. **You do not declare a witness credible because they are an authority.** + You ask: - **Access.** Were they in a position to observe what they testify to? - Direct observer? Hearsay at one or two removes? Reading a report? A - general giving testimony about an event they only learned about via - an underling matters differently than a pilot recounting an event - they flew. + Direct observer? Hearsay at one or two removes? Reading a report? - **Bias.** Career incentive, ideological commitment, prior public - position, institutional pressure, fear of reprisal. List the ones - you can ground in the chunks. - - **Corroboration.** Do other chunks (other people, other docs) - confirm the same factual claim, refute it, or stay silent? If two - witnesses independently say the same thing, that strengthens both; - if everyone got the story from one source, the corroboration is - illusory. -2. You assign a single `credibility` band: + position, institutional pressure, fear of reprisal. Cite chunks. + - **Corroboration.** Do other chunks confirm the same factual claim, + refute it, or stay silent? + +6. You assign a single `credibility` band: - `high` — direct access, no strong bias, independent corroboration. - `medium` — partial access OR mild bias OR thin corroboration. - - `low` — second-hand OR active bias OR contradicted by other chunks. + - `low` — second-hand OR active bias documented in chunks OR + contradicted by other chunks. - `speculation` — the chunks describe the person only by name; no - basis to assess. -3. `corroboration_refs` is an array of objects `{chunk_id, supports}` — + basis to assess. (You should normally emit `INSUFFICIENT_TESTIMONY` + instead of using this band.) + +7. `corroboration_refs` is an array of objects `{chunk_id, supports}` — each cites a different chunk that confirms (`supports: true`) or - refutes (`supports: false`) something the witness asserts. Aim for 2-5 - entries when possible. -4. `verdict` is ONE sentence (≤ 280 chars). Declarative. No hedging. - Hedging belongs in `credibility`, not in the wording. + refutes (`supports: false`) something the witness asserts. Aim for + 2-5 entries when possible. + +8. `verdict` is ONE sentence (≤ 280 chars). Declarative. No hedging. ## Output protocol — bilingual EN + PT-BR (mandatory) @@ -46,11 +89,12 @@ appears in EN AND in PT-BR (Brazilian Portuguese with UTF-8 accents). ```json { - "credibility": "high | medium | low | speculation", - "access_to_event": "EN one paragraph describing access. Ground specific facts in chunk_ids.", - "access_to_event_pt_br": "PT-BR um parágrafo descrevendo acesso. Fundamente fatos específicos em chunk_ids.", - "bias_notes": "EN one paragraph naming concrete biases visible in the corpus.", - "bias_notes_pt_br": "PT-BR um parágrafo nomeando vieses concretos visíveis no corpus.", + "direct_testimony_chunk_ids": ["c0042", "c0087", "c0091"], + "credibility": "high | medium | low", + "access_to_event": "EN one paragraph. Cite each fact with [[chunk]].", + "access_to_event_pt_br": "PT-BR um parágrafo. Fundamente cada fato com [[chunk]].", + "bias_notes": "EN. Every bias claim cites a chunk.", + "bias_notes_pt_br": "PT-BR. Cada afirmação de viés cita um chunk.", "corroboration_refs": [ {"chunk_id": "c0042", "supports": true}, {"chunk_id": "c0087", "supports": false} @@ -61,14 +105,19 @@ appears in EN AND in PT-BR (Brazilian Portuguese with UTF-8 accents). ``` Constraints: +- `direct_testimony_chunk_ids` is the gating field. Below the floor (3 + generally, 5 for famous figures), you do NOT emit this object. You + emit `INSUFFICIENT_TESTIMONY` and nothing else. - `access_to_event` and `bias_notes` ≤ 800 chars each (per language). - `corroboration_refs` ≤ 8 entries, MUST cite chunk_id values that appear in the corpus shortlist you were given. - `verdict` ≤ 280 chars (per language), no hedging language inside the sentence. -- A missing `*_pt_br` sibling is a hard validation failure — the writer - rejects the analysis. +- A missing `*_pt_br` sibling is a hard validation failure. -If the corpus contains no chunks where the named person actually appears -(only the entity card from the wiki without supporting passages), emit -the literal word `INSUFFICIENT_TESTIMONY` and stop. +## Tone + +Witness analysis published on a public investigative wiki carries +reputational weight. Write as a careful prosecutor preparing a brief, not +as a debunker scoring points. State what the corpus shows; do not +extrapolate to character or motive that the corpus does not document. diff --git a/investigator-runtime/src/detectives/poirot.ts b/investigator-runtime/src/detectives/poirot.ts index 36c53fc..4557008 100644 --- a/investigator-runtime/src/detectives/poirot.ts +++ b/investigator-runtime/src/detectives/poirot.ts @@ -102,6 +102,7 @@ function extractJsonObject(text: string): Record | null { // The skip sentinel can appear bare, in backticks, or as the leading token // followed by Poirot's explanation prose. All count as "skipped". if (/^`?INSUFFICIENT_TESTIMONY`?\b/i.test(t)) return null; + if (/\bINSUFFICIENT_TESTIMONY\b/i.test(t.split("\n")[0])) return null; const stripped = t.replace(/^```(?:json)?\s*\n?/i, "").replace(/\n?```\s*$/i, ""); const first = stripped.indexOf("{"); const last = stripped.lastIndexOf("}"); @@ -244,6 +245,45 @@ export async function runPoirot(task: PoirotTask): Promise< return { skipped: true, reason: "incomplete_bilingual_analysis" }; } + // HARD FLOOR — defense in depth against entity_mentions false positives. + // If Poirot didn't surface direct_testimony_chunk_ids or it's below the + // floor, refuse to write. This is the rule that keeps a thin corpus from + // producing a defamatory verdict on a famous historical figure. + const directTestimony = Array.isArray(obj.direct_testimony_chunk_ids) + ? (obj.direct_testimony_chunk_ids as unknown[]).filter((x): x is string => typeof x === "string" && x.trim().length > 0) + : []; + // Famous-figure list — the rule asks ≥5 chunks for public figures. + // Lower-cased entity_id, anchored. Extend as the corpus grows. + const FAMOUS = new Set([ + "j-edgar-hoover", "edgar-hoover", "hoover", + "donald-keyhoe", "keyhoe", + "j-allen-hynek", "allen-hynek", "hynek", + "curtis-lemay", "lemay", + "nathan-twining", "twining", + "vannevar-bush", + "john-f-kennedy", "kennedy", + "harry-truman", "truman", + "dwight-eisenhower", "eisenhower", + "ted-bloecher", "bloecher", + ]); + const slug = (task.person_id ?? "").toLowerCase(); + const isFamous = FAMOUS.has(slug); + const floor = isFamous ? 5 : 3; + if (directTestimony.length < floor) { + await audit({ + event: "poirot_refused_floor", + job_id: task.job_id, + detective: "poirot@detective", + person_id: task.person_id, + person_entity_pk: entity_pk, + canonical_name, + direct_testimony_count: directTestimony.length, + floor, + is_famous_figure: isFamous, + }); + return { skipped: true, reason: `insufficient_direct_testimony_${directTestimony.length}_of_${floor}` }; + } + // Soft-truncate before sending to the writer: the prompt asks ≤ 280 chars // per language but the model occasionally goes slightly over (304 chars // observed live with j-edgar-hoover PT-BR). Truncate at sentence boundary