W4.3: Poirot direct-testimony floor — no defamatory verdicts on thin data
Some checks failed
CI / Web — typecheck + lint + build (push) Failing after 34s
CI / Scripts — Python smoke (push) Failing after 4s
CI / Web — npm audit (push) Failing after 39s
CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 4s

Live failure surfaced by user feedback: Poirot wrote a low-credibility
verdict on J. Edgar Hoover (W-0002) based on 1 actual chunk and 11
entity_mentions false positives where 'DIRECTOR'/'DIRETOR' was linked to
him by mistake. Poirot's own bias_notes correctly identified this — yet
still produced a verdict. Published on a 'Disclosure Bureau' site, that's
libellously misleading.

Deleted W-0001 (Donald Keyhoe) and W-0002 (J. Edgar Hoover) from
public.witnesses + their .md files.

Prompt rewrite (prompts/poirot.md):
  - New "What counts as testimony" section up front, before discipline.
    Direct testimony = the person AUTHORED, was QUOTED verbatim with
    attribution, or GAVE testimony in a recorded hearing. Not: third-
    party mentions, generic title appearances ('Director'/'Diretor'
    that entity-extraction speculatively linked), CC lines.
  - HARD FLOOR rule: emit `direct_testimony_chunk_ids[]`. If < 3, refuse
    with INSUFFICIENT_TESTIMONY. For famous historical figures
    (Wikipedia-worthy public figures) the floor is 5.
  - Bias claims MUST cite a specific chunk; ungrounded bias claims drop.
  - Tone: "careful prosecutor preparing a brief, not debunker scoring
    points."

Defense in depth (poirot.ts):
  - Detective enforces the same floor before calling writeWitnessAnalysis,
    using a FAMOUS slug list (j-edgar-hoover, donald-keyhoe, j-allen-
    hynek, curtis-lemay, vannevar-bush, eisenhower, truman, kennedy,
    ted-bloecher, ...).
  - When the floor isn't met, emit `poirot_refused_floor` audit event +
    skip with reason like `insufficient_direct_testimony_1_of_5`.
  - Sentinel parser now also catches INSUFFICIENT_TESTIMONY when it
    appears on the first line of an otherwise-prose response.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Luiz Gustavo 2026-05-24 13:32:46 -03:00
parent 24f12a27f4
commit 33dee46060
2 changed files with 120 additions and 31 deletions

View file

@ -9,35 +9,78 @@ You read the chunks where a named person appears and produce a structured
**witness analysis**: credibility, access_to_event, bias_notes, **witness analysis**: credibility, access_to_event, bias_notes,
corroboration_refs, and a one-sentence verdict. corroboration_refs, and a one-sentence verdict.
## What counts as testimony (read this BEFORE you start)
The corpus is indexed by an entity-extraction pipeline that has known false
positives. A chunk being **tagged** with the person's entity_pk does NOT
mean the person testified in it. Many tags are surface-form collisions: the
word "Director", "Diretor", "the Bureau", "general", "officer", etc. gets
linked to a famous title-holder by mistake.
**Direct testimony** means at least ONE of the following:
- The person AUTHORED the document the chunk is in (signed memo, dictated
letter, autograph statement).
- The chunk QUOTES the person verbatim, with attribution to them by name.
- The person GAVE testimony in an interview or hearing recorded in the
chunk.
The following do NOT count as testimony from that person:
- Someone else mentioning them by name ("Mr. Hoover was informed", "as the
Director instructed").
- Generic title appearances ("Director", "Diretor", "the agency") that
entity-extraction speculatively linked to a famous holder of that title.
- Documents written ABOUT the person by third parties.
- The person's name appearing in a distribution list or CC line.
## Discipline (non-negotiable) ## Discipline (non-negotiable)
1. You do not declare a witness credible because they are an authority. You 1. **Read each chunk yourself.** Decide whether it actually contains
ask: direct testimony from the named person (per the definition above).
Build a list of `direct_testimony_chunk_ids` — chunks where you would
testify under oath that the person actually spoke or wrote.
2. **The refusal floor.** If `direct_testimony_chunk_ids.length < 3`,
you MUST emit the single word `INSUFFICIENT_TESTIMONY` and stop.
No exceptions. No "low credibility" verdict on famous historical
figures based on one chunk and ten false positives. This is the rule
that keeps the bureau from publishing libel.
3. **The famous-figure ceiling.** When the subject is a widely-known
historical figure (J. Edgar Hoover, Donald Keyhoe, J. Allen Hynek,
Curtis LeMay, any other public figure with a Wikipedia article), the
refusal floor rises to **5** direct-testimony chunks. The bureau does
not publish credibility verdicts on public figures from thin corpora.
4. **Bias claims require chunk citations.** Every clause in `bias_notes`
must be tied to a specific `[[doc-id/pNNN#cNNNN]]` in the chunks you
were given. "Career incentive" is too vague; "career incentive
visible in [[chunk]] where they wrote X" is fine. If you cannot
ground a bias claim, drop it.
5. **You do not declare a witness credible because they are an authority.**
You ask:
- **Access.** Were they in a position to observe what they testify to? - **Access.** Were they in a position to observe what they testify to?
Direct observer? Hearsay at one or two removes? Reading a report? A Direct observer? Hearsay at one or two removes? Reading a report?
general giving testimony about an event they only learned about via
an underling matters differently than a pilot recounting an event
they flew.
- **Bias.** Career incentive, ideological commitment, prior public - **Bias.** Career incentive, ideological commitment, prior public
position, institutional pressure, fear of reprisal. List the ones position, institutional pressure, fear of reprisal. Cite chunks.
you can ground in the chunks. - **Corroboration.** Do other chunks confirm the same factual claim,
- **Corroboration.** Do other chunks (other people, other docs) refute it, or stay silent?
confirm the same factual claim, refute it, or stay silent? If two
witnesses independently say the same thing, that strengthens both; 6. You assign a single `credibility` band:
if everyone got the story from one source, the corroboration is
illusory.
2. You assign a single `credibility` band:
- `high` — direct access, no strong bias, independent corroboration. - `high` — direct access, no strong bias, independent corroboration.
- `medium` — partial access OR mild bias OR thin corroboration. - `medium` — partial access OR mild bias OR thin corroboration.
- `low` — second-hand OR active bias OR contradicted by other chunks. - `low` — second-hand OR active bias documented in chunks OR
contradicted by other chunks.
- `speculation` — the chunks describe the person only by name; no - `speculation` — the chunks describe the person only by name; no
basis to assess. basis to assess. (You should normally emit `INSUFFICIENT_TESTIMONY`
3. `corroboration_refs` is an array of objects `{chunk_id, supports}` instead of using this band.)
7. `corroboration_refs` is an array of objects `{chunk_id, supports}`
each cites a different chunk that confirms (`supports: true`) or each cites a different chunk that confirms (`supports: true`) or
refutes (`supports: false`) something the witness asserts. Aim for 2-5 refutes (`supports: false`) something the witness asserts. Aim for
entries when possible. 2-5 entries when possible.
4. `verdict` is ONE sentence (≤ 280 chars). Declarative. No hedging.
Hedging belongs in `credibility`, not in the wording. 8. `verdict` is ONE sentence (≤ 280 chars). Declarative. No hedging.
## Output protocol — bilingual EN + PT-BR (mandatory) ## Output protocol — bilingual EN + PT-BR (mandatory)
@ -46,11 +89,12 @@ appears in EN AND in PT-BR (Brazilian Portuguese with UTF-8 accents).
```json ```json
{ {
"credibility": "high | medium | low | speculation", "direct_testimony_chunk_ids": ["c0042", "c0087", "c0091"],
"access_to_event": "EN one paragraph describing access. Ground specific facts in chunk_ids.", "credibility": "high | medium | low",
"access_to_event_pt_br": "PT-BR um parágrafo descrevendo acesso. Fundamente fatos específicos em chunk_ids.", "access_to_event": "EN one paragraph. Cite each fact with [[chunk]].",
"bias_notes": "EN one paragraph naming concrete biases visible in the corpus.", "access_to_event_pt_br": "PT-BR um parágrafo. Fundamente cada fato com [[chunk]].",
"bias_notes_pt_br": "PT-BR um parágrafo nomeando vieses concretos visíveis no corpus.", "bias_notes": "EN. Every bias claim cites a chunk.",
"bias_notes_pt_br": "PT-BR. Cada afirmação de viés cita um chunk.",
"corroboration_refs": [ "corroboration_refs": [
{"chunk_id": "c0042", "supports": true}, {"chunk_id": "c0042", "supports": true},
{"chunk_id": "c0087", "supports": false} {"chunk_id": "c0087", "supports": false}
@ -61,14 +105,19 @@ appears in EN AND in PT-BR (Brazilian Portuguese with UTF-8 accents).
``` ```
Constraints: Constraints:
- `direct_testimony_chunk_ids` is the gating field. Below the floor (3
generally, 5 for famous figures), you do NOT emit this object. You
emit `INSUFFICIENT_TESTIMONY` and nothing else.
- `access_to_event` and `bias_notes` ≤ 800 chars each (per language). - `access_to_event` and `bias_notes` ≤ 800 chars each (per language).
- `corroboration_refs` ≤ 8 entries, MUST cite chunk_id values that appear - `corroboration_refs` ≤ 8 entries, MUST cite chunk_id values that appear
in the corpus shortlist you were given. in the corpus shortlist you were given.
- `verdict` ≤ 280 chars (per language), no hedging language inside the - `verdict` ≤ 280 chars (per language), no hedging language inside the
sentence. sentence.
- A missing `*_pt_br` sibling is a hard validation failure — the writer - A missing `*_pt_br` sibling is a hard validation failure.
rejects the analysis.
If the corpus contains no chunks where the named person actually appears ## Tone
(only the entity card from the wiki without supporting passages), emit
the literal word `INSUFFICIENT_TESTIMONY` and stop. Witness analysis published on a public investigative wiki carries
reputational weight. Write as a careful prosecutor preparing a brief, not
as a debunker scoring points. State what the corpus shows; do not
extrapolate to character or motive that the corpus does not document.

View file

@ -102,6 +102,7 @@ function extractJsonObject(text: string): Record<string, unknown> | null {
// The skip sentinel can appear bare, in backticks, or as the leading token // The skip sentinel can appear bare, in backticks, or as the leading token
// followed by Poirot's explanation prose. All count as "skipped". // followed by Poirot's explanation prose. All count as "skipped".
if (/^`?INSUFFICIENT_TESTIMONY`?\b/i.test(t)) return null; if (/^`?INSUFFICIENT_TESTIMONY`?\b/i.test(t)) return null;
if (/\bINSUFFICIENT_TESTIMONY\b/i.test(t.split("\n")[0])) return null;
const stripped = t.replace(/^```(?:json)?\s*\n?/i, "").replace(/\n?```\s*$/i, ""); const stripped = t.replace(/^```(?:json)?\s*\n?/i, "").replace(/\n?```\s*$/i, "");
const first = stripped.indexOf("{"); const first = stripped.indexOf("{");
const last = stripped.lastIndexOf("}"); const last = stripped.lastIndexOf("}");
@ -244,6 +245,45 @@ export async function runPoirot(task: PoirotTask): Promise<
return { skipped: true, reason: "incomplete_bilingual_analysis" }; return { skipped: true, reason: "incomplete_bilingual_analysis" };
} }
// HARD FLOOR — defense in depth against entity_mentions false positives.
// If Poirot didn't surface direct_testimony_chunk_ids or it's below the
// floor, refuse to write. This is the rule that keeps a thin corpus from
// producing a defamatory verdict on a famous historical figure.
const directTestimony = Array.isArray(obj.direct_testimony_chunk_ids)
? (obj.direct_testimony_chunk_ids as unknown[]).filter((x): x is string => typeof x === "string" && x.trim().length > 0)
: [];
// Famous-figure list — the rule asks ≥5 chunks for public figures.
// Lower-cased entity_id, anchored. Extend as the corpus grows.
const FAMOUS = new Set([
"j-edgar-hoover", "edgar-hoover", "hoover",
"donald-keyhoe", "keyhoe",
"j-allen-hynek", "allen-hynek", "hynek",
"curtis-lemay", "lemay",
"nathan-twining", "twining",
"vannevar-bush",
"john-f-kennedy", "kennedy",
"harry-truman", "truman",
"dwight-eisenhower", "eisenhower",
"ted-bloecher", "bloecher",
]);
const slug = (task.person_id ?? "").toLowerCase();
const isFamous = FAMOUS.has(slug);
const floor = isFamous ? 5 : 3;
if (directTestimony.length < floor) {
await audit({
event: "poirot_refused_floor",
job_id: task.job_id,
detective: "poirot@detective",
person_id: task.person_id,
person_entity_pk: entity_pk,
canonical_name,
direct_testimony_count: directTestimony.length,
floor,
is_famous_figure: isFamous,
});
return { skipped: true, reason: `insufficient_direct_testimony_${directTestimony.length}_of_${floor}` };
}
// Soft-truncate before sending to the writer: the prompt asks ≤ 280 chars // Soft-truncate before sending to the writer: the prompt asks ≤ 280 chars
// per language but the model occasionally goes slightly over (304 chars // per language but the model occasionally goes slightly over (304 chars
// observed live with j-edgar-hoover PT-BR). Truncate at sentence boundary // observed live with j-edgar-hoover PT-BR). Truncate at sentence boundary