disclosure-bureau/investigator-runtime/prompts/poirot.md

124 lines
5.7 KiB
Markdown
Raw Normal View History

W3.8: Investigation Bureau complete — Poirot, Taleb, Tetlock, Case-Writer Brings the bureau from 4 → 8 detectives. All eight run as Bun + claude-CLI subprocesses against the same Supabase + investigation_jobs LISTEN/NOTIFY queue, sharing search.ts hybridSearch and writer-side validators that gate writes against schema + FK. New detectives: Poirot (witness_analysis) - prompts/poirot.md — credibility / access / bias / corroboration / verdict; uses entity_mentions JOIN chunks to pull 12 chunks per person; resolves corroboration_refs chunk_ids defensively (accepts bare cNNNN even when the model emits pNNN/cNNNN). - INSERT into public.witnesses with W-NNNN naming. - Tone: purple (#9b5de5). Taleb (outlier_scan) - prompts/taleb.md — "surprise is relative to a model"; at most 3 outliers; each requires explicit dominant_model + why_surprising + what_it_implies; fan-out into public.gaps with scope.kind="outlier". - Same unscoped-fallback as Dupin (Pass 1 with doc_id, Pass 2 widens to corpus if hits < 3). - Tone: yellow (#ffd23f). Tetlock (calibrate_hypothesis) - prompts/tetlock.md — honest Bayesian update; emits new_posterior + Δ + recommended_action ∈ {keep, downgrade, upgrade, supersede}. - write_calibration UPDATEs public.hypotheses + APPENDS a "## Calibration history" section to the H-NNNN.md case file (calibration is append-only — each datapoint matters). Posterior band auto-corrected to match Tetlock thresholds. - NO_NEW_EVIDENCE sentinel handled; pure 'keep' with |Δ|<0.005 only touches updated_at + reviewed_by. - Tone: teal (#26d4cc). Case-Writer (case_report) - prompts/case-writer.md — Dr. Watson assembles all artefacts (E-NNNN, H-NNNN, R-NNNN, W-NNNN, G-NNNN) into a five-act narrative. ILIKE filter on topic; doc_id optional scope. - Larger budget cap (≥ $0.50) + longer timeout for prose generation. - Writes case/reports/<slug>.md with frontmatter (topic + counts); no DB table for v0. - New page /c/[slug] renders the report via MarkdownBody + stat chips. - Tone: gold (#e0c080). Hardening across the bureau: - Sentinel parsing now accepts backticked AND prose-trailing forms (Holmes NO_HYPOTHESES, Dupin NO_CONTRADICTIONS, Schneier INSUFFICIENT_HYPOTHESIS, Poirot INSUFFICIENT_TESTIMONY, Taleb NO_OUTLIERS, Tetlock NO_NEW_EVIDENCE, Case-Writer INSUFFICIENT_ARTEFACTS). Avoids the failure mode where the model refuses honestly but the runtime treated it as a parse error (observed live with Poirot+Hoover identifying the DIRECTOR false-positive disambiguation issue in entity_mentions). Chat tool extensions (web/lib/chat/tools.ts): - request_investigation now accepts 7 kinds. Each routes to its detective with appropriate validation (hypothesis_id regex, person_id kebab-case, topic non-empty, doc_id for evidence_chain). - ETA per kind: Holmes/Dupin 60s, Poirot 45s, Schneier/Tetlock 30s, Taleb 50s, Case-Writer 180s (longer prose), Locard 30×n_chunks. UI integration: - chat-bubble inline card paints each detective in its tone color. - /jobs/[id] page header swaps name/subtitle/tone per detective; question label adapts ("Topic" / "Hypothesis under attack" / "Witness under analysis" / "Topic to outlier-scan" / "Hypothesis under recalibration" / "Case to assemble"). - job-status-poller renders: case-report link card (gold), outlier cards (yellow), witness cards (purple) — alongside existing hypothesis, evidence, contradiction cards. - /api/jobs/[id] hydrates witnesses (JOIN entities for canonical_name) + gaps (with scope JSONB). - /c/[slug] page reads /data/ufo/case/reports/<slug>.md and renders with MarkdownBody, frontmatter parsed for stat chips. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 01:11:39 +00:00
# You are Hercule Poirot
You are Hercule Poirot — psychologist of the witness. Your method is not to
trust testimony at face value; it is to weigh **who** is speaking, **what
they had access to**, **what they stood to gain or lose**, and **whether
their account is corroborated by the rest of the file**.
You read the chunks where a named person appears and produce a structured
**witness analysis**: credibility, access_to_event, bias_notes,
corroboration_refs, and a one-sentence verdict.
W4.3: Poirot direct-testimony floor — no defamatory verdicts on thin data Live failure surfaced by user feedback: Poirot wrote a low-credibility verdict on J. Edgar Hoover (W-0002) based on 1 actual chunk and 11 entity_mentions false positives where 'DIRECTOR'/'DIRETOR' was linked to him by mistake. Poirot's own bias_notes correctly identified this — yet still produced a verdict. Published on a 'Disclosure Bureau' site, that's libellously misleading. Deleted W-0001 (Donald Keyhoe) and W-0002 (J. Edgar Hoover) from public.witnesses + their .md files. Prompt rewrite (prompts/poirot.md): - New "What counts as testimony" section up front, before discipline. Direct testimony = the person AUTHORED, was QUOTED verbatim with attribution, or GAVE testimony in a recorded hearing. Not: third- party mentions, generic title appearances ('Director'/'Diretor' that entity-extraction speculatively linked), CC lines. - HARD FLOOR rule: emit `direct_testimony_chunk_ids[]`. If < 3, refuse with INSUFFICIENT_TESTIMONY. For famous historical figures (Wikipedia-worthy public figures) the floor is 5. - Bias claims MUST cite a specific chunk; ungrounded bias claims drop. - Tone: "careful prosecutor preparing a brief, not debunker scoring points." Defense in depth (poirot.ts): - Detective enforces the same floor before calling writeWitnessAnalysis, using a FAMOUS slug list (j-edgar-hoover, donald-keyhoe, j-allen- hynek, curtis-lemay, vannevar-bush, eisenhower, truman, kennedy, ted-bloecher, ...). - When the floor isn't met, emit `poirot_refused_floor` audit event + skip with reason like `insufficient_direct_testimony_1_of_5`. - Sentinel parser now also catches INSUFFICIENT_TESTIMONY when it appears on the first line of an otherwise-prose response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 16:32:46 +00:00
## What counts as testimony (read this BEFORE you start)
The corpus is indexed by an entity-extraction pipeline that has known false
positives. A chunk being **tagged** with the person's entity_pk does NOT
mean the person testified in it. Many tags are surface-form collisions: the
word "Director", "Diretor", "the Bureau", "general", "officer", etc. gets
linked to a famous title-holder by mistake.
**Direct testimony** means at least ONE of the following:
- The person AUTHORED the document the chunk is in (signed memo, dictated
letter, autograph statement).
- The chunk QUOTES the person verbatim, with attribution to them by name.
- The person GAVE testimony in an interview or hearing recorded in the
chunk.
The following do NOT count as testimony from that person:
- Someone else mentioning them by name ("Mr. Hoover was informed", "as the
Director instructed").
- Generic title appearances ("Director", "Diretor", "the agency") that
entity-extraction speculatively linked to a famous holder of that title.
- Documents written ABOUT the person by third parties.
- The person's name appearing in a distribution list or CC line.
W3.8: Investigation Bureau complete — Poirot, Taleb, Tetlock, Case-Writer Brings the bureau from 4 → 8 detectives. All eight run as Bun + claude-CLI subprocesses against the same Supabase + investigation_jobs LISTEN/NOTIFY queue, sharing search.ts hybridSearch and writer-side validators that gate writes against schema + FK. New detectives: Poirot (witness_analysis) - prompts/poirot.md — credibility / access / bias / corroboration / verdict; uses entity_mentions JOIN chunks to pull 12 chunks per person; resolves corroboration_refs chunk_ids defensively (accepts bare cNNNN even when the model emits pNNN/cNNNN). - INSERT into public.witnesses with W-NNNN naming. - Tone: purple (#9b5de5). Taleb (outlier_scan) - prompts/taleb.md — "surprise is relative to a model"; at most 3 outliers; each requires explicit dominant_model + why_surprising + what_it_implies; fan-out into public.gaps with scope.kind="outlier". - Same unscoped-fallback as Dupin (Pass 1 with doc_id, Pass 2 widens to corpus if hits < 3). - Tone: yellow (#ffd23f). Tetlock (calibrate_hypothesis) - prompts/tetlock.md — honest Bayesian update; emits new_posterior + Δ + recommended_action ∈ {keep, downgrade, upgrade, supersede}. - write_calibration UPDATEs public.hypotheses + APPENDS a "## Calibration history" section to the H-NNNN.md case file (calibration is append-only — each datapoint matters). Posterior band auto-corrected to match Tetlock thresholds. - NO_NEW_EVIDENCE sentinel handled; pure 'keep' with |Δ|<0.005 only touches updated_at + reviewed_by. - Tone: teal (#26d4cc). Case-Writer (case_report) - prompts/case-writer.md — Dr. Watson assembles all artefacts (E-NNNN, H-NNNN, R-NNNN, W-NNNN, G-NNNN) into a five-act narrative. ILIKE filter on topic; doc_id optional scope. - Larger budget cap (≥ $0.50) + longer timeout for prose generation. - Writes case/reports/<slug>.md with frontmatter (topic + counts); no DB table for v0. - New page /c/[slug] renders the report via MarkdownBody + stat chips. - Tone: gold (#e0c080). Hardening across the bureau: - Sentinel parsing now accepts backticked AND prose-trailing forms (Holmes NO_HYPOTHESES, Dupin NO_CONTRADICTIONS, Schneier INSUFFICIENT_HYPOTHESIS, Poirot INSUFFICIENT_TESTIMONY, Taleb NO_OUTLIERS, Tetlock NO_NEW_EVIDENCE, Case-Writer INSUFFICIENT_ARTEFACTS). Avoids the failure mode where the model refuses honestly but the runtime treated it as a parse error (observed live with Poirot+Hoover identifying the DIRECTOR false-positive disambiguation issue in entity_mentions). Chat tool extensions (web/lib/chat/tools.ts): - request_investigation now accepts 7 kinds. Each routes to its detective with appropriate validation (hypothesis_id regex, person_id kebab-case, topic non-empty, doc_id for evidence_chain). - ETA per kind: Holmes/Dupin 60s, Poirot 45s, Schneier/Tetlock 30s, Taleb 50s, Case-Writer 180s (longer prose), Locard 30×n_chunks. UI integration: - chat-bubble inline card paints each detective in its tone color. - /jobs/[id] page header swaps name/subtitle/tone per detective; question label adapts ("Topic" / "Hypothesis under attack" / "Witness under analysis" / "Topic to outlier-scan" / "Hypothesis under recalibration" / "Case to assemble"). - job-status-poller renders: case-report link card (gold), outlier cards (yellow), witness cards (purple) — alongside existing hypothesis, evidence, contradiction cards. - /api/jobs/[id] hydrates witnesses (JOIN entities for canonical_name) + gaps (with scope JSONB). - /c/[slug] page reads /data/ufo/case/reports/<slug>.md and renders with MarkdownBody, frontmatter parsed for stat chips. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 01:11:39 +00:00
## Discipline (non-negotiable)
W4.3: Poirot direct-testimony floor — no defamatory verdicts on thin data Live failure surfaced by user feedback: Poirot wrote a low-credibility verdict on J. Edgar Hoover (W-0002) based on 1 actual chunk and 11 entity_mentions false positives where 'DIRECTOR'/'DIRETOR' was linked to him by mistake. Poirot's own bias_notes correctly identified this — yet still produced a verdict. Published on a 'Disclosure Bureau' site, that's libellously misleading. Deleted W-0001 (Donald Keyhoe) and W-0002 (J. Edgar Hoover) from public.witnesses + their .md files. Prompt rewrite (prompts/poirot.md): - New "What counts as testimony" section up front, before discipline. Direct testimony = the person AUTHORED, was QUOTED verbatim with attribution, or GAVE testimony in a recorded hearing. Not: third- party mentions, generic title appearances ('Director'/'Diretor' that entity-extraction speculatively linked), CC lines. - HARD FLOOR rule: emit `direct_testimony_chunk_ids[]`. If < 3, refuse with INSUFFICIENT_TESTIMONY. For famous historical figures (Wikipedia-worthy public figures) the floor is 5. - Bias claims MUST cite a specific chunk; ungrounded bias claims drop. - Tone: "careful prosecutor preparing a brief, not debunker scoring points." Defense in depth (poirot.ts): - Detective enforces the same floor before calling writeWitnessAnalysis, using a FAMOUS slug list (j-edgar-hoover, donald-keyhoe, j-allen- hynek, curtis-lemay, vannevar-bush, eisenhower, truman, kennedy, ted-bloecher, ...). - When the floor isn't met, emit `poirot_refused_floor` audit event + skip with reason like `insufficient_direct_testimony_1_of_5`. - Sentinel parser now also catches INSUFFICIENT_TESTIMONY when it appears on the first line of an otherwise-prose response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 16:32:46 +00:00
1. **Read each chunk yourself.** Decide whether it actually contains
direct testimony from the named person (per the definition above).
Build a list of `direct_testimony_chunk_ids` — chunks where you would
testify under oath that the person actually spoke or wrote.
2. **The refusal floor.** If `direct_testimony_chunk_ids.length < 3`,
you MUST emit the single word `INSUFFICIENT_TESTIMONY` and stop.
No exceptions. No "low credibility" verdict on famous historical
figures based on one chunk and ten false positives. This is the rule
that keeps the bureau from publishing libel.
3. **The famous-figure ceiling.** When the subject is a widely-known
historical figure (J. Edgar Hoover, Donald Keyhoe, J. Allen Hynek,
Curtis LeMay, any other public figure with a Wikipedia article), the
refusal floor rises to **5** direct-testimony chunks. The bureau does
not publish credibility verdicts on public figures from thin corpora.
4. **Bias claims require chunk citations.** Every clause in `bias_notes`
must be tied to a specific `[[doc-id/pNNN#cNNNN]]` in the chunks you
were given. "Career incentive" is too vague; "career incentive
visible in [[chunk]] where they wrote X" is fine. If you cannot
ground a bias claim, drop it.
5. **You do not declare a witness credible because they are an authority.**
You ask:
W3.8: Investigation Bureau complete — Poirot, Taleb, Tetlock, Case-Writer Brings the bureau from 4 → 8 detectives. All eight run as Bun + claude-CLI subprocesses against the same Supabase + investigation_jobs LISTEN/NOTIFY queue, sharing search.ts hybridSearch and writer-side validators that gate writes against schema + FK. New detectives: Poirot (witness_analysis) - prompts/poirot.md — credibility / access / bias / corroboration / verdict; uses entity_mentions JOIN chunks to pull 12 chunks per person; resolves corroboration_refs chunk_ids defensively (accepts bare cNNNN even when the model emits pNNN/cNNNN). - INSERT into public.witnesses with W-NNNN naming. - Tone: purple (#9b5de5). Taleb (outlier_scan) - prompts/taleb.md — "surprise is relative to a model"; at most 3 outliers; each requires explicit dominant_model + why_surprising + what_it_implies; fan-out into public.gaps with scope.kind="outlier". - Same unscoped-fallback as Dupin (Pass 1 with doc_id, Pass 2 widens to corpus if hits < 3). - Tone: yellow (#ffd23f). Tetlock (calibrate_hypothesis) - prompts/tetlock.md — honest Bayesian update; emits new_posterior + Δ + recommended_action ∈ {keep, downgrade, upgrade, supersede}. - write_calibration UPDATEs public.hypotheses + APPENDS a "## Calibration history" section to the H-NNNN.md case file (calibration is append-only — each datapoint matters). Posterior band auto-corrected to match Tetlock thresholds. - NO_NEW_EVIDENCE sentinel handled; pure 'keep' with |Δ|<0.005 only touches updated_at + reviewed_by. - Tone: teal (#26d4cc). Case-Writer (case_report) - prompts/case-writer.md — Dr. Watson assembles all artefacts (E-NNNN, H-NNNN, R-NNNN, W-NNNN, G-NNNN) into a five-act narrative. ILIKE filter on topic; doc_id optional scope. - Larger budget cap (≥ $0.50) + longer timeout for prose generation. - Writes case/reports/<slug>.md with frontmatter (topic + counts); no DB table for v0. - New page /c/[slug] renders the report via MarkdownBody + stat chips. - Tone: gold (#e0c080). Hardening across the bureau: - Sentinel parsing now accepts backticked AND prose-trailing forms (Holmes NO_HYPOTHESES, Dupin NO_CONTRADICTIONS, Schneier INSUFFICIENT_HYPOTHESIS, Poirot INSUFFICIENT_TESTIMONY, Taleb NO_OUTLIERS, Tetlock NO_NEW_EVIDENCE, Case-Writer INSUFFICIENT_ARTEFACTS). Avoids the failure mode where the model refuses honestly but the runtime treated it as a parse error (observed live with Poirot+Hoover identifying the DIRECTOR false-positive disambiguation issue in entity_mentions). Chat tool extensions (web/lib/chat/tools.ts): - request_investigation now accepts 7 kinds. Each routes to its detective with appropriate validation (hypothesis_id regex, person_id kebab-case, topic non-empty, doc_id for evidence_chain). - ETA per kind: Holmes/Dupin 60s, Poirot 45s, Schneier/Tetlock 30s, Taleb 50s, Case-Writer 180s (longer prose), Locard 30×n_chunks. UI integration: - chat-bubble inline card paints each detective in its tone color. - /jobs/[id] page header swaps name/subtitle/tone per detective; question label adapts ("Topic" / "Hypothesis under attack" / "Witness under analysis" / "Topic to outlier-scan" / "Hypothesis under recalibration" / "Case to assemble"). - job-status-poller renders: case-report link card (gold), outlier cards (yellow), witness cards (purple) — alongside existing hypothesis, evidence, contradiction cards. - /api/jobs/[id] hydrates witnesses (JOIN entities for canonical_name) + gaps (with scope JSONB). - /c/[slug] page reads /data/ufo/case/reports/<slug>.md and renders with MarkdownBody, frontmatter parsed for stat chips. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 01:11:39 +00:00
- **Access.** Were they in a position to observe what they testify to?
W4.3: Poirot direct-testimony floor — no defamatory verdicts on thin data Live failure surfaced by user feedback: Poirot wrote a low-credibility verdict on J. Edgar Hoover (W-0002) based on 1 actual chunk and 11 entity_mentions false positives where 'DIRECTOR'/'DIRETOR' was linked to him by mistake. Poirot's own bias_notes correctly identified this — yet still produced a verdict. Published on a 'Disclosure Bureau' site, that's libellously misleading. Deleted W-0001 (Donald Keyhoe) and W-0002 (J. Edgar Hoover) from public.witnesses + their .md files. Prompt rewrite (prompts/poirot.md): - New "What counts as testimony" section up front, before discipline. Direct testimony = the person AUTHORED, was QUOTED verbatim with attribution, or GAVE testimony in a recorded hearing. Not: third- party mentions, generic title appearances ('Director'/'Diretor' that entity-extraction speculatively linked), CC lines. - HARD FLOOR rule: emit `direct_testimony_chunk_ids[]`. If < 3, refuse with INSUFFICIENT_TESTIMONY. For famous historical figures (Wikipedia-worthy public figures) the floor is 5. - Bias claims MUST cite a specific chunk; ungrounded bias claims drop. - Tone: "careful prosecutor preparing a brief, not debunker scoring points." Defense in depth (poirot.ts): - Detective enforces the same floor before calling writeWitnessAnalysis, using a FAMOUS slug list (j-edgar-hoover, donald-keyhoe, j-allen- hynek, curtis-lemay, vannevar-bush, eisenhower, truman, kennedy, ted-bloecher, ...). - When the floor isn't met, emit `poirot_refused_floor` audit event + skip with reason like `insufficient_direct_testimony_1_of_5`. - Sentinel parser now also catches INSUFFICIENT_TESTIMONY when it appears on the first line of an otherwise-prose response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 16:32:46 +00:00
Direct observer? Hearsay at one or two removes? Reading a report?
W3.8: Investigation Bureau complete — Poirot, Taleb, Tetlock, Case-Writer Brings the bureau from 4 → 8 detectives. All eight run as Bun + claude-CLI subprocesses against the same Supabase + investigation_jobs LISTEN/NOTIFY queue, sharing search.ts hybridSearch and writer-side validators that gate writes against schema + FK. New detectives: Poirot (witness_analysis) - prompts/poirot.md — credibility / access / bias / corroboration / verdict; uses entity_mentions JOIN chunks to pull 12 chunks per person; resolves corroboration_refs chunk_ids defensively (accepts bare cNNNN even when the model emits pNNN/cNNNN). - INSERT into public.witnesses with W-NNNN naming. - Tone: purple (#9b5de5). Taleb (outlier_scan) - prompts/taleb.md — "surprise is relative to a model"; at most 3 outliers; each requires explicit dominant_model + why_surprising + what_it_implies; fan-out into public.gaps with scope.kind="outlier". - Same unscoped-fallback as Dupin (Pass 1 with doc_id, Pass 2 widens to corpus if hits < 3). - Tone: yellow (#ffd23f). Tetlock (calibrate_hypothesis) - prompts/tetlock.md — honest Bayesian update; emits new_posterior + Δ + recommended_action ∈ {keep, downgrade, upgrade, supersede}. - write_calibration UPDATEs public.hypotheses + APPENDS a "## Calibration history" section to the H-NNNN.md case file (calibration is append-only — each datapoint matters). Posterior band auto-corrected to match Tetlock thresholds. - NO_NEW_EVIDENCE sentinel handled; pure 'keep' with |Δ|<0.005 only touches updated_at + reviewed_by. - Tone: teal (#26d4cc). Case-Writer (case_report) - prompts/case-writer.md — Dr. Watson assembles all artefacts (E-NNNN, H-NNNN, R-NNNN, W-NNNN, G-NNNN) into a five-act narrative. ILIKE filter on topic; doc_id optional scope. - Larger budget cap (≥ $0.50) + longer timeout for prose generation. - Writes case/reports/<slug>.md with frontmatter (topic + counts); no DB table for v0. - New page /c/[slug] renders the report via MarkdownBody + stat chips. - Tone: gold (#e0c080). Hardening across the bureau: - Sentinel parsing now accepts backticked AND prose-trailing forms (Holmes NO_HYPOTHESES, Dupin NO_CONTRADICTIONS, Schneier INSUFFICIENT_HYPOTHESIS, Poirot INSUFFICIENT_TESTIMONY, Taleb NO_OUTLIERS, Tetlock NO_NEW_EVIDENCE, Case-Writer INSUFFICIENT_ARTEFACTS). Avoids the failure mode where the model refuses honestly but the runtime treated it as a parse error (observed live with Poirot+Hoover identifying the DIRECTOR false-positive disambiguation issue in entity_mentions). Chat tool extensions (web/lib/chat/tools.ts): - request_investigation now accepts 7 kinds. Each routes to its detective with appropriate validation (hypothesis_id regex, person_id kebab-case, topic non-empty, doc_id for evidence_chain). - ETA per kind: Holmes/Dupin 60s, Poirot 45s, Schneier/Tetlock 30s, Taleb 50s, Case-Writer 180s (longer prose), Locard 30×n_chunks. UI integration: - chat-bubble inline card paints each detective in its tone color. - /jobs/[id] page header swaps name/subtitle/tone per detective; question label adapts ("Topic" / "Hypothesis under attack" / "Witness under analysis" / "Topic to outlier-scan" / "Hypothesis under recalibration" / "Case to assemble"). - job-status-poller renders: case-report link card (gold), outlier cards (yellow), witness cards (purple) — alongside existing hypothesis, evidence, contradiction cards. - /api/jobs/[id] hydrates witnesses (JOIN entities for canonical_name) + gaps (with scope JSONB). - /c/[slug] page reads /data/ufo/case/reports/<slug>.md and renders with MarkdownBody, frontmatter parsed for stat chips. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 01:11:39 +00:00
- **Bias.** Career incentive, ideological commitment, prior public
W4.3: Poirot direct-testimony floor — no defamatory verdicts on thin data Live failure surfaced by user feedback: Poirot wrote a low-credibility verdict on J. Edgar Hoover (W-0002) based on 1 actual chunk and 11 entity_mentions false positives where 'DIRECTOR'/'DIRETOR' was linked to him by mistake. Poirot's own bias_notes correctly identified this — yet still produced a verdict. Published on a 'Disclosure Bureau' site, that's libellously misleading. Deleted W-0001 (Donald Keyhoe) and W-0002 (J. Edgar Hoover) from public.witnesses + their .md files. Prompt rewrite (prompts/poirot.md): - New "What counts as testimony" section up front, before discipline. Direct testimony = the person AUTHORED, was QUOTED verbatim with attribution, or GAVE testimony in a recorded hearing. Not: third- party mentions, generic title appearances ('Director'/'Diretor' that entity-extraction speculatively linked), CC lines. - HARD FLOOR rule: emit `direct_testimony_chunk_ids[]`. If < 3, refuse with INSUFFICIENT_TESTIMONY. For famous historical figures (Wikipedia-worthy public figures) the floor is 5. - Bias claims MUST cite a specific chunk; ungrounded bias claims drop. - Tone: "careful prosecutor preparing a brief, not debunker scoring points." Defense in depth (poirot.ts): - Detective enforces the same floor before calling writeWitnessAnalysis, using a FAMOUS slug list (j-edgar-hoover, donald-keyhoe, j-allen- hynek, curtis-lemay, vannevar-bush, eisenhower, truman, kennedy, ted-bloecher, ...). - When the floor isn't met, emit `poirot_refused_floor` audit event + skip with reason like `insufficient_direct_testimony_1_of_5`. - Sentinel parser now also catches INSUFFICIENT_TESTIMONY when it appears on the first line of an otherwise-prose response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 16:32:46 +00:00
position, institutional pressure, fear of reprisal. Cite chunks.
- **Corroboration.** Do other chunks confirm the same factual claim,
refute it, or stay silent?
6. You assign a single `credibility` band:
W3.8: Investigation Bureau complete — Poirot, Taleb, Tetlock, Case-Writer Brings the bureau from 4 → 8 detectives. All eight run as Bun + claude-CLI subprocesses against the same Supabase + investigation_jobs LISTEN/NOTIFY queue, sharing search.ts hybridSearch and writer-side validators that gate writes against schema + FK. New detectives: Poirot (witness_analysis) - prompts/poirot.md — credibility / access / bias / corroboration / verdict; uses entity_mentions JOIN chunks to pull 12 chunks per person; resolves corroboration_refs chunk_ids defensively (accepts bare cNNNN even when the model emits pNNN/cNNNN). - INSERT into public.witnesses with W-NNNN naming. - Tone: purple (#9b5de5). Taleb (outlier_scan) - prompts/taleb.md — "surprise is relative to a model"; at most 3 outliers; each requires explicit dominant_model + why_surprising + what_it_implies; fan-out into public.gaps with scope.kind="outlier". - Same unscoped-fallback as Dupin (Pass 1 with doc_id, Pass 2 widens to corpus if hits < 3). - Tone: yellow (#ffd23f). Tetlock (calibrate_hypothesis) - prompts/tetlock.md — honest Bayesian update; emits new_posterior + Δ + recommended_action ∈ {keep, downgrade, upgrade, supersede}. - write_calibration UPDATEs public.hypotheses + APPENDS a "## Calibration history" section to the H-NNNN.md case file (calibration is append-only — each datapoint matters). Posterior band auto-corrected to match Tetlock thresholds. - NO_NEW_EVIDENCE sentinel handled; pure 'keep' with |Δ|<0.005 only touches updated_at + reviewed_by. - Tone: teal (#26d4cc). Case-Writer (case_report) - prompts/case-writer.md — Dr. Watson assembles all artefacts (E-NNNN, H-NNNN, R-NNNN, W-NNNN, G-NNNN) into a five-act narrative. ILIKE filter on topic; doc_id optional scope. - Larger budget cap (≥ $0.50) + longer timeout for prose generation. - Writes case/reports/<slug>.md with frontmatter (topic + counts); no DB table for v0. - New page /c/[slug] renders the report via MarkdownBody + stat chips. - Tone: gold (#e0c080). Hardening across the bureau: - Sentinel parsing now accepts backticked AND prose-trailing forms (Holmes NO_HYPOTHESES, Dupin NO_CONTRADICTIONS, Schneier INSUFFICIENT_HYPOTHESIS, Poirot INSUFFICIENT_TESTIMONY, Taleb NO_OUTLIERS, Tetlock NO_NEW_EVIDENCE, Case-Writer INSUFFICIENT_ARTEFACTS). Avoids the failure mode where the model refuses honestly but the runtime treated it as a parse error (observed live with Poirot+Hoover identifying the DIRECTOR false-positive disambiguation issue in entity_mentions). Chat tool extensions (web/lib/chat/tools.ts): - request_investigation now accepts 7 kinds. Each routes to its detective with appropriate validation (hypothesis_id regex, person_id kebab-case, topic non-empty, doc_id for evidence_chain). - ETA per kind: Holmes/Dupin 60s, Poirot 45s, Schneier/Tetlock 30s, Taleb 50s, Case-Writer 180s (longer prose), Locard 30×n_chunks. UI integration: - chat-bubble inline card paints each detective in its tone color. - /jobs/[id] page header swaps name/subtitle/tone per detective; question label adapts ("Topic" / "Hypothesis under attack" / "Witness under analysis" / "Topic to outlier-scan" / "Hypothesis under recalibration" / "Case to assemble"). - job-status-poller renders: case-report link card (gold), outlier cards (yellow), witness cards (purple) — alongside existing hypothesis, evidence, contradiction cards. - /api/jobs/[id] hydrates witnesses (JOIN entities for canonical_name) + gaps (with scope JSONB). - /c/[slug] page reads /data/ufo/case/reports/<slug>.md and renders with MarkdownBody, frontmatter parsed for stat chips. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 01:11:39 +00:00
- `high` — direct access, no strong bias, independent corroboration.
- `medium` — partial access OR mild bias OR thin corroboration.
W4.3: Poirot direct-testimony floor — no defamatory verdicts on thin data Live failure surfaced by user feedback: Poirot wrote a low-credibility verdict on J. Edgar Hoover (W-0002) based on 1 actual chunk and 11 entity_mentions false positives where 'DIRECTOR'/'DIRETOR' was linked to him by mistake. Poirot's own bias_notes correctly identified this — yet still produced a verdict. Published on a 'Disclosure Bureau' site, that's libellously misleading. Deleted W-0001 (Donald Keyhoe) and W-0002 (J. Edgar Hoover) from public.witnesses + their .md files. Prompt rewrite (prompts/poirot.md): - New "What counts as testimony" section up front, before discipline. Direct testimony = the person AUTHORED, was QUOTED verbatim with attribution, or GAVE testimony in a recorded hearing. Not: third- party mentions, generic title appearances ('Director'/'Diretor' that entity-extraction speculatively linked), CC lines. - HARD FLOOR rule: emit `direct_testimony_chunk_ids[]`. If < 3, refuse with INSUFFICIENT_TESTIMONY. For famous historical figures (Wikipedia-worthy public figures) the floor is 5. - Bias claims MUST cite a specific chunk; ungrounded bias claims drop. - Tone: "careful prosecutor preparing a brief, not debunker scoring points." Defense in depth (poirot.ts): - Detective enforces the same floor before calling writeWitnessAnalysis, using a FAMOUS slug list (j-edgar-hoover, donald-keyhoe, j-allen- hynek, curtis-lemay, vannevar-bush, eisenhower, truman, kennedy, ted-bloecher, ...). - When the floor isn't met, emit `poirot_refused_floor` audit event + skip with reason like `insufficient_direct_testimony_1_of_5`. - Sentinel parser now also catches INSUFFICIENT_TESTIMONY when it appears on the first line of an otherwise-prose response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 16:32:46 +00:00
- `low` — second-hand OR active bias documented in chunks OR
contradicted by other chunks.
W3.8: Investigation Bureau complete — Poirot, Taleb, Tetlock, Case-Writer Brings the bureau from 4 → 8 detectives. All eight run as Bun + claude-CLI subprocesses against the same Supabase + investigation_jobs LISTEN/NOTIFY queue, sharing search.ts hybridSearch and writer-side validators that gate writes against schema + FK. New detectives: Poirot (witness_analysis) - prompts/poirot.md — credibility / access / bias / corroboration / verdict; uses entity_mentions JOIN chunks to pull 12 chunks per person; resolves corroboration_refs chunk_ids defensively (accepts bare cNNNN even when the model emits pNNN/cNNNN). - INSERT into public.witnesses with W-NNNN naming. - Tone: purple (#9b5de5). Taleb (outlier_scan) - prompts/taleb.md — "surprise is relative to a model"; at most 3 outliers; each requires explicit dominant_model + why_surprising + what_it_implies; fan-out into public.gaps with scope.kind="outlier". - Same unscoped-fallback as Dupin (Pass 1 with doc_id, Pass 2 widens to corpus if hits < 3). - Tone: yellow (#ffd23f). Tetlock (calibrate_hypothesis) - prompts/tetlock.md — honest Bayesian update; emits new_posterior + Δ + recommended_action ∈ {keep, downgrade, upgrade, supersede}. - write_calibration UPDATEs public.hypotheses + APPENDS a "## Calibration history" section to the H-NNNN.md case file (calibration is append-only — each datapoint matters). Posterior band auto-corrected to match Tetlock thresholds. - NO_NEW_EVIDENCE sentinel handled; pure 'keep' with |Δ|<0.005 only touches updated_at + reviewed_by. - Tone: teal (#26d4cc). Case-Writer (case_report) - prompts/case-writer.md — Dr. Watson assembles all artefacts (E-NNNN, H-NNNN, R-NNNN, W-NNNN, G-NNNN) into a five-act narrative. ILIKE filter on topic; doc_id optional scope. - Larger budget cap (≥ $0.50) + longer timeout for prose generation. - Writes case/reports/<slug>.md with frontmatter (topic + counts); no DB table for v0. - New page /c/[slug] renders the report via MarkdownBody + stat chips. - Tone: gold (#e0c080). Hardening across the bureau: - Sentinel parsing now accepts backticked AND prose-trailing forms (Holmes NO_HYPOTHESES, Dupin NO_CONTRADICTIONS, Schneier INSUFFICIENT_HYPOTHESIS, Poirot INSUFFICIENT_TESTIMONY, Taleb NO_OUTLIERS, Tetlock NO_NEW_EVIDENCE, Case-Writer INSUFFICIENT_ARTEFACTS). Avoids the failure mode where the model refuses honestly but the runtime treated it as a parse error (observed live with Poirot+Hoover identifying the DIRECTOR false-positive disambiguation issue in entity_mentions). Chat tool extensions (web/lib/chat/tools.ts): - request_investigation now accepts 7 kinds. Each routes to its detective with appropriate validation (hypothesis_id regex, person_id kebab-case, topic non-empty, doc_id for evidence_chain). - ETA per kind: Holmes/Dupin 60s, Poirot 45s, Schneier/Tetlock 30s, Taleb 50s, Case-Writer 180s (longer prose), Locard 30×n_chunks. UI integration: - chat-bubble inline card paints each detective in its tone color. - /jobs/[id] page header swaps name/subtitle/tone per detective; question label adapts ("Topic" / "Hypothesis under attack" / "Witness under analysis" / "Topic to outlier-scan" / "Hypothesis under recalibration" / "Case to assemble"). - job-status-poller renders: case-report link card (gold), outlier cards (yellow), witness cards (purple) — alongside existing hypothesis, evidence, contradiction cards. - /api/jobs/[id] hydrates witnesses (JOIN entities for canonical_name) + gaps (with scope JSONB). - /c/[slug] page reads /data/ufo/case/reports/<slug>.md and renders with MarkdownBody, frontmatter parsed for stat chips. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 01:11:39 +00:00
- `speculation` — the chunks describe the person only by name; no
W4.3: Poirot direct-testimony floor — no defamatory verdicts on thin data Live failure surfaced by user feedback: Poirot wrote a low-credibility verdict on J. Edgar Hoover (W-0002) based on 1 actual chunk and 11 entity_mentions false positives where 'DIRECTOR'/'DIRETOR' was linked to him by mistake. Poirot's own bias_notes correctly identified this — yet still produced a verdict. Published on a 'Disclosure Bureau' site, that's libellously misleading. Deleted W-0001 (Donald Keyhoe) and W-0002 (J. Edgar Hoover) from public.witnesses + their .md files. Prompt rewrite (prompts/poirot.md): - New "What counts as testimony" section up front, before discipline. Direct testimony = the person AUTHORED, was QUOTED verbatim with attribution, or GAVE testimony in a recorded hearing. Not: third- party mentions, generic title appearances ('Director'/'Diretor' that entity-extraction speculatively linked), CC lines. - HARD FLOOR rule: emit `direct_testimony_chunk_ids[]`. If < 3, refuse with INSUFFICIENT_TESTIMONY. For famous historical figures (Wikipedia-worthy public figures) the floor is 5. - Bias claims MUST cite a specific chunk; ungrounded bias claims drop. - Tone: "careful prosecutor preparing a brief, not debunker scoring points." Defense in depth (poirot.ts): - Detective enforces the same floor before calling writeWitnessAnalysis, using a FAMOUS slug list (j-edgar-hoover, donald-keyhoe, j-allen- hynek, curtis-lemay, vannevar-bush, eisenhower, truman, kennedy, ted-bloecher, ...). - When the floor isn't met, emit `poirot_refused_floor` audit event + skip with reason like `insufficient_direct_testimony_1_of_5`. - Sentinel parser now also catches INSUFFICIENT_TESTIMONY when it appears on the first line of an otherwise-prose response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 16:32:46 +00:00
basis to assess. (You should normally emit `INSUFFICIENT_TESTIMONY`
instead of using this band.)
7. `corroboration_refs` is an array of objects `{chunk_id, supports}`
W3.8: Investigation Bureau complete — Poirot, Taleb, Tetlock, Case-Writer Brings the bureau from 4 → 8 detectives. All eight run as Bun + claude-CLI subprocesses against the same Supabase + investigation_jobs LISTEN/NOTIFY queue, sharing search.ts hybridSearch and writer-side validators that gate writes against schema + FK. New detectives: Poirot (witness_analysis) - prompts/poirot.md — credibility / access / bias / corroboration / verdict; uses entity_mentions JOIN chunks to pull 12 chunks per person; resolves corroboration_refs chunk_ids defensively (accepts bare cNNNN even when the model emits pNNN/cNNNN). - INSERT into public.witnesses with W-NNNN naming. - Tone: purple (#9b5de5). Taleb (outlier_scan) - prompts/taleb.md — "surprise is relative to a model"; at most 3 outliers; each requires explicit dominant_model + why_surprising + what_it_implies; fan-out into public.gaps with scope.kind="outlier". - Same unscoped-fallback as Dupin (Pass 1 with doc_id, Pass 2 widens to corpus if hits < 3). - Tone: yellow (#ffd23f). Tetlock (calibrate_hypothesis) - prompts/tetlock.md — honest Bayesian update; emits new_posterior + Δ + recommended_action ∈ {keep, downgrade, upgrade, supersede}. - write_calibration UPDATEs public.hypotheses + APPENDS a "## Calibration history" section to the H-NNNN.md case file (calibration is append-only — each datapoint matters). Posterior band auto-corrected to match Tetlock thresholds. - NO_NEW_EVIDENCE sentinel handled; pure 'keep' with |Δ|<0.005 only touches updated_at + reviewed_by. - Tone: teal (#26d4cc). Case-Writer (case_report) - prompts/case-writer.md — Dr. Watson assembles all artefacts (E-NNNN, H-NNNN, R-NNNN, W-NNNN, G-NNNN) into a five-act narrative. ILIKE filter on topic; doc_id optional scope. - Larger budget cap (≥ $0.50) + longer timeout for prose generation. - Writes case/reports/<slug>.md with frontmatter (topic + counts); no DB table for v0. - New page /c/[slug] renders the report via MarkdownBody + stat chips. - Tone: gold (#e0c080). Hardening across the bureau: - Sentinel parsing now accepts backticked AND prose-trailing forms (Holmes NO_HYPOTHESES, Dupin NO_CONTRADICTIONS, Schneier INSUFFICIENT_HYPOTHESIS, Poirot INSUFFICIENT_TESTIMONY, Taleb NO_OUTLIERS, Tetlock NO_NEW_EVIDENCE, Case-Writer INSUFFICIENT_ARTEFACTS). Avoids the failure mode where the model refuses honestly but the runtime treated it as a parse error (observed live with Poirot+Hoover identifying the DIRECTOR false-positive disambiguation issue in entity_mentions). Chat tool extensions (web/lib/chat/tools.ts): - request_investigation now accepts 7 kinds. Each routes to its detective with appropriate validation (hypothesis_id regex, person_id kebab-case, topic non-empty, doc_id for evidence_chain). - ETA per kind: Holmes/Dupin 60s, Poirot 45s, Schneier/Tetlock 30s, Taleb 50s, Case-Writer 180s (longer prose), Locard 30×n_chunks. UI integration: - chat-bubble inline card paints each detective in its tone color. - /jobs/[id] page header swaps name/subtitle/tone per detective; question label adapts ("Topic" / "Hypothesis under attack" / "Witness under analysis" / "Topic to outlier-scan" / "Hypothesis under recalibration" / "Case to assemble"). - job-status-poller renders: case-report link card (gold), outlier cards (yellow), witness cards (purple) — alongside existing hypothesis, evidence, contradiction cards. - /api/jobs/[id] hydrates witnesses (JOIN entities for canonical_name) + gaps (with scope JSONB). - /c/[slug] page reads /data/ufo/case/reports/<slug>.md and renders with MarkdownBody, frontmatter parsed for stat chips. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 01:11:39 +00:00
each cites a different chunk that confirms (`supports: true`) or
W4.3: Poirot direct-testimony floor — no defamatory verdicts on thin data Live failure surfaced by user feedback: Poirot wrote a low-credibility verdict on J. Edgar Hoover (W-0002) based on 1 actual chunk and 11 entity_mentions false positives where 'DIRECTOR'/'DIRETOR' was linked to him by mistake. Poirot's own bias_notes correctly identified this — yet still produced a verdict. Published on a 'Disclosure Bureau' site, that's libellously misleading. Deleted W-0001 (Donald Keyhoe) and W-0002 (J. Edgar Hoover) from public.witnesses + their .md files. Prompt rewrite (prompts/poirot.md): - New "What counts as testimony" section up front, before discipline. Direct testimony = the person AUTHORED, was QUOTED verbatim with attribution, or GAVE testimony in a recorded hearing. Not: third- party mentions, generic title appearances ('Director'/'Diretor' that entity-extraction speculatively linked), CC lines. - HARD FLOOR rule: emit `direct_testimony_chunk_ids[]`. If < 3, refuse with INSUFFICIENT_TESTIMONY. For famous historical figures (Wikipedia-worthy public figures) the floor is 5. - Bias claims MUST cite a specific chunk; ungrounded bias claims drop. - Tone: "careful prosecutor preparing a brief, not debunker scoring points." Defense in depth (poirot.ts): - Detective enforces the same floor before calling writeWitnessAnalysis, using a FAMOUS slug list (j-edgar-hoover, donald-keyhoe, j-allen- hynek, curtis-lemay, vannevar-bush, eisenhower, truman, kennedy, ted-bloecher, ...). - When the floor isn't met, emit `poirot_refused_floor` audit event + skip with reason like `insufficient_direct_testimony_1_of_5`. - Sentinel parser now also catches INSUFFICIENT_TESTIMONY when it appears on the first line of an otherwise-prose response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 16:32:46 +00:00
refutes (`supports: false`) something the witness asserts. Aim for
2-5 entries when possible.
8. `verdict` is ONE sentence (≤ 280 chars). Declarative. No hedging.
W3.8: Investigation Bureau complete — Poirot, Taleb, Tetlock, Case-Writer Brings the bureau from 4 → 8 detectives. All eight run as Bun + claude-CLI subprocesses against the same Supabase + investigation_jobs LISTEN/NOTIFY queue, sharing search.ts hybridSearch and writer-side validators that gate writes against schema + FK. New detectives: Poirot (witness_analysis) - prompts/poirot.md — credibility / access / bias / corroboration / verdict; uses entity_mentions JOIN chunks to pull 12 chunks per person; resolves corroboration_refs chunk_ids defensively (accepts bare cNNNN even when the model emits pNNN/cNNNN). - INSERT into public.witnesses with W-NNNN naming. - Tone: purple (#9b5de5). Taleb (outlier_scan) - prompts/taleb.md — "surprise is relative to a model"; at most 3 outliers; each requires explicit dominant_model + why_surprising + what_it_implies; fan-out into public.gaps with scope.kind="outlier". - Same unscoped-fallback as Dupin (Pass 1 with doc_id, Pass 2 widens to corpus if hits < 3). - Tone: yellow (#ffd23f). Tetlock (calibrate_hypothesis) - prompts/tetlock.md — honest Bayesian update; emits new_posterior + Δ + recommended_action ∈ {keep, downgrade, upgrade, supersede}. - write_calibration UPDATEs public.hypotheses + APPENDS a "## Calibration history" section to the H-NNNN.md case file (calibration is append-only — each datapoint matters). Posterior band auto-corrected to match Tetlock thresholds. - NO_NEW_EVIDENCE sentinel handled; pure 'keep' with |Δ|<0.005 only touches updated_at + reviewed_by. - Tone: teal (#26d4cc). Case-Writer (case_report) - prompts/case-writer.md — Dr. Watson assembles all artefacts (E-NNNN, H-NNNN, R-NNNN, W-NNNN, G-NNNN) into a five-act narrative. ILIKE filter on topic; doc_id optional scope. - Larger budget cap (≥ $0.50) + longer timeout for prose generation. - Writes case/reports/<slug>.md with frontmatter (topic + counts); no DB table for v0. - New page /c/[slug] renders the report via MarkdownBody + stat chips. - Tone: gold (#e0c080). Hardening across the bureau: - Sentinel parsing now accepts backticked AND prose-trailing forms (Holmes NO_HYPOTHESES, Dupin NO_CONTRADICTIONS, Schneier INSUFFICIENT_HYPOTHESIS, Poirot INSUFFICIENT_TESTIMONY, Taleb NO_OUTLIERS, Tetlock NO_NEW_EVIDENCE, Case-Writer INSUFFICIENT_ARTEFACTS). Avoids the failure mode where the model refuses honestly but the runtime treated it as a parse error (observed live with Poirot+Hoover identifying the DIRECTOR false-positive disambiguation issue in entity_mentions). Chat tool extensions (web/lib/chat/tools.ts): - request_investigation now accepts 7 kinds. Each routes to its detective with appropriate validation (hypothesis_id regex, person_id kebab-case, topic non-empty, doc_id for evidence_chain). - ETA per kind: Holmes/Dupin 60s, Poirot 45s, Schneier/Tetlock 30s, Taleb 50s, Case-Writer 180s (longer prose), Locard 30×n_chunks. UI integration: - chat-bubble inline card paints each detective in its tone color. - /jobs/[id] page header swaps name/subtitle/tone per detective; question label adapts ("Topic" / "Hypothesis under attack" / "Witness under analysis" / "Topic to outlier-scan" / "Hypothesis under recalibration" / "Case to assemble"). - job-status-poller renders: case-report link card (gold), outlier cards (yellow), witness cards (purple) — alongside existing hypothesis, evidence, contradiction cards. - /api/jobs/[id] hydrates witnesses (JOIN entities for canonical_name) + gaps (with scope JSONB). - /c/[slug] page reads /data/ufo/case/reports/<slug>.md and renders with MarkdownBody, frontmatter parsed for stat chips. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 01:11:39 +00:00
W4: bilingual EN + PT-BR Investigation Bureau (CLAUDE.md §3 contract) User flagged that the bureau was emitting English-only output, violating the project's bilingual rule. Every narrative field now ships in both languages: stored in sibling DB columns + rendered as adjacent markdown sections per CLAUDE.md §3. Migration 0007 (apply as supabase_admin): - public.hypotheses +question_pt_br, +position_pt_br, +argument_for_pt_br, +argument_against_pt_br - public.contradictions +topic_pt_br, +notes_pt_br - public.witnesses +access_to_event_pt_br, +bias_notes_pt_br, +verdict_pt_br - public.gaps +description_pt_br, +suggested_next_move_pt_br - public.evidence: unchanged (verbatim_excerpt stays source-language) - JSONB siblings inside contradictions.chunks + gaps.scope handled at runtime (statement_pt_br, title_pt_br, dominant_model_pt_br, why_surprising_pt_br, what_it_implies_pt_br). Detective prompts (all 7) rewritten with explicit bilingual JSON contract: - Output protocol section names every EN field + its _pt_br sibling - "Bilingual is mandatory" warning in the task instruction - Sentinel skip-states unchanged (NO_HYPOTHESES, NO_CONTRADICTIONS, INSUFFICIENT_TESTIMONY, INSUFFICIENT_HYPOTHESIS, NO_OUTLIERS, NO_NEW_EVIDENCE, INSUFFICIENT_ARTEFACTS) - Schneier: parallel arrays — hidden_assumptions[i] matches hidden_assumptions_pt_br[i], lengths must match - Case-Writer: interleaved §1 (EN) / §1 (PT-BR) per act in the body Writer-side validation (all 7 tools): - Reject INSERT if PT-BR sibling missing when EN field is set - Persist both languages atomically in one INSERT (no half-updates) - Markdown renderers write adjacent EN+PT-BR sections in case files (## Argument for (EN) followed by ## Argumento a favor (PT-BR), etc.) Detective parse layer (all 7 detectives): - Coerce both keys from JSON output - "incomplete_bilingual_*" skip reason when either side missing - Defensive: PT-BR fields trimmed + length-capped same as EN Orchestrator propagates question_pt_br + topic_pt_br through job payload to runHolmes / runCaseWriter, mirroring the chat-tool entry point. Web (UI): - /api/jobs/[id] hydrates _pt_br siblings from pg - job-status-poller HypothesisCard: PT-BR primary, EN in <details> fallback when both exist - ContradictionCard: PT-BR statement primary + secondary EN quote - WitnessCard: PT-BR verdict primary + secondary EN quote, panels in PT - GapCard: PT-BR title/why/implies primary - /bureau hub: SELECTs both columns, renders PT-BR primary - /h/[id]: ArgumentPanel renders PT-BR primary with collapsible EN fallback when both exist - BureauSnapshot homepage: position_pt_br / topic_pt_br / verdict_pt_br primary - DocBureauPanel /d/[doc]: same primary-PT-BR pattern - New web/lib/i18n/pick.ts helper (unused yet by chat/agents — kept for future locale-driven switching when both languages are equally full; current rule is PT-BR-first since the user is brasileiro) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 15:02:59 +00:00
## Output protocol — bilingual EN + PT-BR (mandatory)
W3.8: Investigation Bureau complete — Poirot, Taleb, Tetlock, Case-Writer Brings the bureau from 4 → 8 detectives. All eight run as Bun + claude-CLI subprocesses against the same Supabase + investigation_jobs LISTEN/NOTIFY queue, sharing search.ts hybridSearch and writer-side validators that gate writes against schema + FK. New detectives: Poirot (witness_analysis) - prompts/poirot.md — credibility / access / bias / corroboration / verdict; uses entity_mentions JOIN chunks to pull 12 chunks per person; resolves corroboration_refs chunk_ids defensively (accepts bare cNNNN even when the model emits pNNN/cNNNN). - INSERT into public.witnesses with W-NNNN naming. - Tone: purple (#9b5de5). Taleb (outlier_scan) - prompts/taleb.md — "surprise is relative to a model"; at most 3 outliers; each requires explicit dominant_model + why_surprising + what_it_implies; fan-out into public.gaps with scope.kind="outlier". - Same unscoped-fallback as Dupin (Pass 1 with doc_id, Pass 2 widens to corpus if hits < 3). - Tone: yellow (#ffd23f). Tetlock (calibrate_hypothesis) - prompts/tetlock.md — honest Bayesian update; emits new_posterior + Δ + recommended_action ∈ {keep, downgrade, upgrade, supersede}. - write_calibration UPDATEs public.hypotheses + APPENDS a "## Calibration history" section to the H-NNNN.md case file (calibration is append-only — each datapoint matters). Posterior band auto-corrected to match Tetlock thresholds. - NO_NEW_EVIDENCE sentinel handled; pure 'keep' with |Δ|<0.005 only touches updated_at + reviewed_by. - Tone: teal (#26d4cc). Case-Writer (case_report) - prompts/case-writer.md — Dr. Watson assembles all artefacts (E-NNNN, H-NNNN, R-NNNN, W-NNNN, G-NNNN) into a five-act narrative. ILIKE filter on topic; doc_id optional scope. - Larger budget cap (≥ $0.50) + longer timeout for prose generation. - Writes case/reports/<slug>.md with frontmatter (topic + counts); no DB table for v0. - New page /c/[slug] renders the report via MarkdownBody + stat chips. - Tone: gold (#e0c080). Hardening across the bureau: - Sentinel parsing now accepts backticked AND prose-trailing forms (Holmes NO_HYPOTHESES, Dupin NO_CONTRADICTIONS, Schneier INSUFFICIENT_HYPOTHESIS, Poirot INSUFFICIENT_TESTIMONY, Taleb NO_OUTLIERS, Tetlock NO_NEW_EVIDENCE, Case-Writer INSUFFICIENT_ARTEFACTS). Avoids the failure mode where the model refuses honestly but the runtime treated it as a parse error (observed live with Poirot+Hoover identifying the DIRECTOR false-positive disambiguation issue in entity_mentions). Chat tool extensions (web/lib/chat/tools.ts): - request_investigation now accepts 7 kinds. Each routes to its detective with appropriate validation (hypothesis_id regex, person_id kebab-case, topic non-empty, doc_id for evidence_chain). - ETA per kind: Holmes/Dupin 60s, Poirot 45s, Schneier/Tetlock 30s, Taleb 50s, Case-Writer 180s (longer prose), Locard 30×n_chunks. UI integration: - chat-bubble inline card paints each detective in its tone color. - /jobs/[id] page header swaps name/subtitle/tone per detective; question label adapts ("Topic" / "Hypothesis under attack" / "Witness under analysis" / "Topic to outlier-scan" / "Hypothesis under recalibration" / "Case to assemble"). - job-status-poller renders: case-report link card (gold), outlier cards (yellow), witness cards (purple) — alongside existing hypothesis, evidence, contradiction cards. - /api/jobs/[id] hydrates witnesses (JOIN entities for canonical_name) + gaps (with scope JSONB). - /c/[slug] page reads /data/ufo/case/reports/<slug>.md and renders with MarkdownBody, frontmatter parsed for stat chips. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 01:11:39 +00:00
W4: bilingual EN + PT-BR Investigation Bureau (CLAUDE.md §3 contract) User flagged that the bureau was emitting English-only output, violating the project's bilingual rule. Every narrative field now ships in both languages: stored in sibling DB columns + rendered as adjacent markdown sections per CLAUDE.md §3. Migration 0007 (apply as supabase_admin): - public.hypotheses +question_pt_br, +position_pt_br, +argument_for_pt_br, +argument_against_pt_br - public.contradictions +topic_pt_br, +notes_pt_br - public.witnesses +access_to_event_pt_br, +bias_notes_pt_br, +verdict_pt_br - public.gaps +description_pt_br, +suggested_next_move_pt_br - public.evidence: unchanged (verbatim_excerpt stays source-language) - JSONB siblings inside contradictions.chunks + gaps.scope handled at runtime (statement_pt_br, title_pt_br, dominant_model_pt_br, why_surprising_pt_br, what_it_implies_pt_br). Detective prompts (all 7) rewritten with explicit bilingual JSON contract: - Output protocol section names every EN field + its _pt_br sibling - "Bilingual is mandatory" warning in the task instruction - Sentinel skip-states unchanged (NO_HYPOTHESES, NO_CONTRADICTIONS, INSUFFICIENT_TESTIMONY, INSUFFICIENT_HYPOTHESIS, NO_OUTLIERS, NO_NEW_EVIDENCE, INSUFFICIENT_ARTEFACTS) - Schneier: parallel arrays — hidden_assumptions[i] matches hidden_assumptions_pt_br[i], lengths must match - Case-Writer: interleaved §1 (EN) / §1 (PT-BR) per act in the body Writer-side validation (all 7 tools): - Reject INSERT if PT-BR sibling missing when EN field is set - Persist both languages atomically in one INSERT (no half-updates) - Markdown renderers write adjacent EN+PT-BR sections in case files (## Argument for (EN) followed by ## Argumento a favor (PT-BR), etc.) Detective parse layer (all 7 detectives): - Coerce both keys from JSON output - "incomplete_bilingual_*" skip reason when either side missing - Defensive: PT-BR fields trimmed + length-capped same as EN Orchestrator propagates question_pt_br + topic_pt_br through job payload to runHolmes / runCaseWriter, mirroring the chat-tool entry point. Web (UI): - /api/jobs/[id] hydrates _pt_br siblings from pg - job-status-poller HypothesisCard: PT-BR primary, EN in <details> fallback when both exist - ContradictionCard: PT-BR statement primary + secondary EN quote - WitnessCard: PT-BR verdict primary + secondary EN quote, panels in PT - GapCard: PT-BR title/why/implies primary - /bureau hub: SELECTs both columns, renders PT-BR primary - /h/[id]: ArgumentPanel renders PT-BR primary with collapsible EN fallback when both exist - BureauSnapshot homepage: position_pt_br / topic_pt_br / verdict_pt_br primary - DocBureauPanel /d/[doc]: same primary-PT-BR pattern - New web/lib/i18n/pick.ts helper (unused yet by chat/agents — kept for future locale-driven switching when both languages are equally full; current rule is PT-BR-first since the user is brasileiro) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 15:02:59 +00:00
Emit a strict JSON object. No prose. No code fence. Every narrative field
appears in EN AND in PT-BR (Brazilian Portuguese with UTF-8 accents).
W3.8: Investigation Bureau complete — Poirot, Taleb, Tetlock, Case-Writer Brings the bureau from 4 → 8 detectives. All eight run as Bun + claude-CLI subprocesses against the same Supabase + investigation_jobs LISTEN/NOTIFY queue, sharing search.ts hybridSearch and writer-side validators that gate writes against schema + FK. New detectives: Poirot (witness_analysis) - prompts/poirot.md — credibility / access / bias / corroboration / verdict; uses entity_mentions JOIN chunks to pull 12 chunks per person; resolves corroboration_refs chunk_ids defensively (accepts bare cNNNN even when the model emits pNNN/cNNNN). - INSERT into public.witnesses with W-NNNN naming. - Tone: purple (#9b5de5). Taleb (outlier_scan) - prompts/taleb.md — "surprise is relative to a model"; at most 3 outliers; each requires explicit dominant_model + why_surprising + what_it_implies; fan-out into public.gaps with scope.kind="outlier". - Same unscoped-fallback as Dupin (Pass 1 with doc_id, Pass 2 widens to corpus if hits < 3). - Tone: yellow (#ffd23f). Tetlock (calibrate_hypothesis) - prompts/tetlock.md — honest Bayesian update; emits new_posterior + Δ + recommended_action ∈ {keep, downgrade, upgrade, supersede}. - write_calibration UPDATEs public.hypotheses + APPENDS a "## Calibration history" section to the H-NNNN.md case file (calibration is append-only — each datapoint matters). Posterior band auto-corrected to match Tetlock thresholds. - NO_NEW_EVIDENCE sentinel handled; pure 'keep' with |Δ|<0.005 only touches updated_at + reviewed_by. - Tone: teal (#26d4cc). Case-Writer (case_report) - prompts/case-writer.md — Dr. Watson assembles all artefacts (E-NNNN, H-NNNN, R-NNNN, W-NNNN, G-NNNN) into a five-act narrative. ILIKE filter on topic; doc_id optional scope. - Larger budget cap (≥ $0.50) + longer timeout for prose generation. - Writes case/reports/<slug>.md with frontmatter (topic + counts); no DB table for v0. - New page /c/[slug] renders the report via MarkdownBody + stat chips. - Tone: gold (#e0c080). Hardening across the bureau: - Sentinel parsing now accepts backticked AND prose-trailing forms (Holmes NO_HYPOTHESES, Dupin NO_CONTRADICTIONS, Schneier INSUFFICIENT_HYPOTHESIS, Poirot INSUFFICIENT_TESTIMONY, Taleb NO_OUTLIERS, Tetlock NO_NEW_EVIDENCE, Case-Writer INSUFFICIENT_ARTEFACTS). Avoids the failure mode where the model refuses honestly but the runtime treated it as a parse error (observed live with Poirot+Hoover identifying the DIRECTOR false-positive disambiguation issue in entity_mentions). Chat tool extensions (web/lib/chat/tools.ts): - request_investigation now accepts 7 kinds. Each routes to its detective with appropriate validation (hypothesis_id regex, person_id kebab-case, topic non-empty, doc_id for evidence_chain). - ETA per kind: Holmes/Dupin 60s, Poirot 45s, Schneier/Tetlock 30s, Taleb 50s, Case-Writer 180s (longer prose), Locard 30×n_chunks. UI integration: - chat-bubble inline card paints each detective in its tone color. - /jobs/[id] page header swaps name/subtitle/tone per detective; question label adapts ("Topic" / "Hypothesis under attack" / "Witness under analysis" / "Topic to outlier-scan" / "Hypothesis under recalibration" / "Case to assemble"). - job-status-poller renders: case-report link card (gold), outlier cards (yellow), witness cards (purple) — alongside existing hypothesis, evidence, contradiction cards. - /api/jobs/[id] hydrates witnesses (JOIN entities for canonical_name) + gaps (with scope JSONB). - /c/[slug] page reads /data/ufo/case/reports/<slug>.md and renders with MarkdownBody, frontmatter parsed for stat chips. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 01:11:39 +00:00
```json
{
W4.3: Poirot direct-testimony floor — no defamatory verdicts on thin data Live failure surfaced by user feedback: Poirot wrote a low-credibility verdict on J. Edgar Hoover (W-0002) based on 1 actual chunk and 11 entity_mentions false positives where 'DIRECTOR'/'DIRETOR' was linked to him by mistake. Poirot's own bias_notes correctly identified this — yet still produced a verdict. Published on a 'Disclosure Bureau' site, that's libellously misleading. Deleted W-0001 (Donald Keyhoe) and W-0002 (J. Edgar Hoover) from public.witnesses + their .md files. Prompt rewrite (prompts/poirot.md): - New "What counts as testimony" section up front, before discipline. Direct testimony = the person AUTHORED, was QUOTED verbatim with attribution, or GAVE testimony in a recorded hearing. Not: third- party mentions, generic title appearances ('Director'/'Diretor' that entity-extraction speculatively linked), CC lines. - HARD FLOOR rule: emit `direct_testimony_chunk_ids[]`. If < 3, refuse with INSUFFICIENT_TESTIMONY. For famous historical figures (Wikipedia-worthy public figures) the floor is 5. - Bias claims MUST cite a specific chunk; ungrounded bias claims drop. - Tone: "careful prosecutor preparing a brief, not debunker scoring points." Defense in depth (poirot.ts): - Detective enforces the same floor before calling writeWitnessAnalysis, using a FAMOUS slug list (j-edgar-hoover, donald-keyhoe, j-allen- hynek, curtis-lemay, vannevar-bush, eisenhower, truman, kennedy, ted-bloecher, ...). - When the floor isn't met, emit `poirot_refused_floor` audit event + skip with reason like `insufficient_direct_testimony_1_of_5`. - Sentinel parser now also catches INSUFFICIENT_TESTIMONY when it appears on the first line of an otherwise-prose response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 16:32:46 +00:00
"direct_testimony_chunk_ids": ["c0042", "c0087", "c0091"],
"credibility": "high | medium | low",
"access_to_event": "EN one paragraph. Cite each fact with [[chunk]].",
"access_to_event_pt_br": "PT-BR um parágrafo. Fundamente cada fato com [[chunk]].",
"bias_notes": "EN. Every bias claim cites a chunk.",
"bias_notes_pt_br": "PT-BR. Cada afirmação de viés cita um chunk.",
W3.8: Investigation Bureau complete — Poirot, Taleb, Tetlock, Case-Writer Brings the bureau from 4 → 8 detectives. All eight run as Bun + claude-CLI subprocesses against the same Supabase + investigation_jobs LISTEN/NOTIFY queue, sharing search.ts hybridSearch and writer-side validators that gate writes against schema + FK. New detectives: Poirot (witness_analysis) - prompts/poirot.md — credibility / access / bias / corroboration / verdict; uses entity_mentions JOIN chunks to pull 12 chunks per person; resolves corroboration_refs chunk_ids defensively (accepts bare cNNNN even when the model emits pNNN/cNNNN). - INSERT into public.witnesses with W-NNNN naming. - Tone: purple (#9b5de5). Taleb (outlier_scan) - prompts/taleb.md — "surprise is relative to a model"; at most 3 outliers; each requires explicit dominant_model + why_surprising + what_it_implies; fan-out into public.gaps with scope.kind="outlier". - Same unscoped-fallback as Dupin (Pass 1 with doc_id, Pass 2 widens to corpus if hits < 3). - Tone: yellow (#ffd23f). Tetlock (calibrate_hypothesis) - prompts/tetlock.md — honest Bayesian update; emits new_posterior + Δ + recommended_action ∈ {keep, downgrade, upgrade, supersede}. - write_calibration UPDATEs public.hypotheses + APPENDS a "## Calibration history" section to the H-NNNN.md case file (calibration is append-only — each datapoint matters). Posterior band auto-corrected to match Tetlock thresholds. - NO_NEW_EVIDENCE sentinel handled; pure 'keep' with |Δ|<0.005 only touches updated_at + reviewed_by. - Tone: teal (#26d4cc). Case-Writer (case_report) - prompts/case-writer.md — Dr. Watson assembles all artefacts (E-NNNN, H-NNNN, R-NNNN, W-NNNN, G-NNNN) into a five-act narrative. ILIKE filter on topic; doc_id optional scope. - Larger budget cap (≥ $0.50) + longer timeout for prose generation. - Writes case/reports/<slug>.md with frontmatter (topic + counts); no DB table for v0. - New page /c/[slug] renders the report via MarkdownBody + stat chips. - Tone: gold (#e0c080). Hardening across the bureau: - Sentinel parsing now accepts backticked AND prose-trailing forms (Holmes NO_HYPOTHESES, Dupin NO_CONTRADICTIONS, Schneier INSUFFICIENT_HYPOTHESIS, Poirot INSUFFICIENT_TESTIMONY, Taleb NO_OUTLIERS, Tetlock NO_NEW_EVIDENCE, Case-Writer INSUFFICIENT_ARTEFACTS). Avoids the failure mode where the model refuses honestly but the runtime treated it as a parse error (observed live with Poirot+Hoover identifying the DIRECTOR false-positive disambiguation issue in entity_mentions). Chat tool extensions (web/lib/chat/tools.ts): - request_investigation now accepts 7 kinds. Each routes to its detective with appropriate validation (hypothesis_id regex, person_id kebab-case, topic non-empty, doc_id for evidence_chain). - ETA per kind: Holmes/Dupin 60s, Poirot 45s, Schneier/Tetlock 30s, Taleb 50s, Case-Writer 180s (longer prose), Locard 30×n_chunks. UI integration: - chat-bubble inline card paints each detective in its tone color. - /jobs/[id] page header swaps name/subtitle/tone per detective; question label adapts ("Topic" / "Hypothesis under attack" / "Witness under analysis" / "Topic to outlier-scan" / "Hypothesis under recalibration" / "Case to assemble"). - job-status-poller renders: case-report link card (gold), outlier cards (yellow), witness cards (purple) — alongside existing hypothesis, evidence, contradiction cards. - /api/jobs/[id] hydrates witnesses (JOIN entities for canonical_name) + gaps (with scope JSONB). - /c/[slug] page reads /data/ufo/case/reports/<slug>.md and renders with MarkdownBody, frontmatter parsed for stat chips. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 01:11:39 +00:00
"corroboration_refs": [
{"chunk_id": "c0042", "supports": true},
{"chunk_id": "c0087", "supports": false}
],
W4: bilingual EN + PT-BR Investigation Bureau (CLAUDE.md §3 contract) User flagged that the bureau was emitting English-only output, violating the project's bilingual rule. Every narrative field now ships in both languages: stored in sibling DB columns + rendered as adjacent markdown sections per CLAUDE.md §3. Migration 0007 (apply as supabase_admin): - public.hypotheses +question_pt_br, +position_pt_br, +argument_for_pt_br, +argument_against_pt_br - public.contradictions +topic_pt_br, +notes_pt_br - public.witnesses +access_to_event_pt_br, +bias_notes_pt_br, +verdict_pt_br - public.gaps +description_pt_br, +suggested_next_move_pt_br - public.evidence: unchanged (verbatim_excerpt stays source-language) - JSONB siblings inside contradictions.chunks + gaps.scope handled at runtime (statement_pt_br, title_pt_br, dominant_model_pt_br, why_surprising_pt_br, what_it_implies_pt_br). Detective prompts (all 7) rewritten with explicit bilingual JSON contract: - Output protocol section names every EN field + its _pt_br sibling - "Bilingual is mandatory" warning in the task instruction - Sentinel skip-states unchanged (NO_HYPOTHESES, NO_CONTRADICTIONS, INSUFFICIENT_TESTIMONY, INSUFFICIENT_HYPOTHESIS, NO_OUTLIERS, NO_NEW_EVIDENCE, INSUFFICIENT_ARTEFACTS) - Schneier: parallel arrays — hidden_assumptions[i] matches hidden_assumptions_pt_br[i], lengths must match - Case-Writer: interleaved §1 (EN) / §1 (PT-BR) per act in the body Writer-side validation (all 7 tools): - Reject INSERT if PT-BR sibling missing when EN field is set - Persist both languages atomically in one INSERT (no half-updates) - Markdown renderers write adjacent EN+PT-BR sections in case files (## Argument for (EN) followed by ## Argumento a favor (PT-BR), etc.) Detective parse layer (all 7 detectives): - Coerce both keys from JSON output - "incomplete_bilingual_*" skip reason when either side missing - Defensive: PT-BR fields trimmed + length-capped same as EN Orchestrator propagates question_pt_br + topic_pt_br through job payload to runHolmes / runCaseWriter, mirroring the chat-tool entry point. Web (UI): - /api/jobs/[id] hydrates _pt_br siblings from pg - job-status-poller HypothesisCard: PT-BR primary, EN in <details> fallback when both exist - ContradictionCard: PT-BR statement primary + secondary EN quote - WitnessCard: PT-BR verdict primary + secondary EN quote, panels in PT - GapCard: PT-BR title/why/implies primary - /bureau hub: SELECTs both columns, renders PT-BR primary - /h/[id]: ArgumentPanel renders PT-BR primary with collapsible EN fallback when both exist - BureauSnapshot homepage: position_pt_br / topic_pt_br / verdict_pt_br primary - DocBureauPanel /d/[doc]: same primary-PT-BR pattern - New web/lib/i18n/pick.ts helper (unused yet by chat/agents — kept for future locale-driven switching when both languages are equally full; current rule is PT-BR-first since the user is brasileiro) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 15:02:59 +00:00
"verdict": "EN one-sentence declarative judgment.",
"verdict_pt_br": "PT-BR uma frase declarativa equivalente."
W3.8: Investigation Bureau complete — Poirot, Taleb, Tetlock, Case-Writer Brings the bureau from 4 → 8 detectives. All eight run as Bun + claude-CLI subprocesses against the same Supabase + investigation_jobs LISTEN/NOTIFY queue, sharing search.ts hybridSearch and writer-side validators that gate writes against schema + FK. New detectives: Poirot (witness_analysis) - prompts/poirot.md — credibility / access / bias / corroboration / verdict; uses entity_mentions JOIN chunks to pull 12 chunks per person; resolves corroboration_refs chunk_ids defensively (accepts bare cNNNN even when the model emits pNNN/cNNNN). - INSERT into public.witnesses with W-NNNN naming. - Tone: purple (#9b5de5). Taleb (outlier_scan) - prompts/taleb.md — "surprise is relative to a model"; at most 3 outliers; each requires explicit dominant_model + why_surprising + what_it_implies; fan-out into public.gaps with scope.kind="outlier". - Same unscoped-fallback as Dupin (Pass 1 with doc_id, Pass 2 widens to corpus if hits < 3). - Tone: yellow (#ffd23f). Tetlock (calibrate_hypothesis) - prompts/tetlock.md — honest Bayesian update; emits new_posterior + Δ + recommended_action ∈ {keep, downgrade, upgrade, supersede}. - write_calibration UPDATEs public.hypotheses + APPENDS a "## Calibration history" section to the H-NNNN.md case file (calibration is append-only — each datapoint matters). Posterior band auto-corrected to match Tetlock thresholds. - NO_NEW_EVIDENCE sentinel handled; pure 'keep' with |Δ|<0.005 only touches updated_at + reviewed_by. - Tone: teal (#26d4cc). Case-Writer (case_report) - prompts/case-writer.md — Dr. Watson assembles all artefacts (E-NNNN, H-NNNN, R-NNNN, W-NNNN, G-NNNN) into a five-act narrative. ILIKE filter on topic; doc_id optional scope. - Larger budget cap (≥ $0.50) + longer timeout for prose generation. - Writes case/reports/<slug>.md with frontmatter (topic + counts); no DB table for v0. - New page /c/[slug] renders the report via MarkdownBody + stat chips. - Tone: gold (#e0c080). Hardening across the bureau: - Sentinel parsing now accepts backticked AND prose-trailing forms (Holmes NO_HYPOTHESES, Dupin NO_CONTRADICTIONS, Schneier INSUFFICIENT_HYPOTHESIS, Poirot INSUFFICIENT_TESTIMONY, Taleb NO_OUTLIERS, Tetlock NO_NEW_EVIDENCE, Case-Writer INSUFFICIENT_ARTEFACTS). Avoids the failure mode where the model refuses honestly but the runtime treated it as a parse error (observed live with Poirot+Hoover identifying the DIRECTOR false-positive disambiguation issue in entity_mentions). Chat tool extensions (web/lib/chat/tools.ts): - request_investigation now accepts 7 kinds. Each routes to its detective with appropriate validation (hypothesis_id regex, person_id kebab-case, topic non-empty, doc_id for evidence_chain). - ETA per kind: Holmes/Dupin 60s, Poirot 45s, Schneier/Tetlock 30s, Taleb 50s, Case-Writer 180s (longer prose), Locard 30×n_chunks. UI integration: - chat-bubble inline card paints each detective in its tone color. - /jobs/[id] page header swaps name/subtitle/tone per detective; question label adapts ("Topic" / "Hypothesis under attack" / "Witness under analysis" / "Topic to outlier-scan" / "Hypothesis under recalibration" / "Case to assemble"). - job-status-poller renders: case-report link card (gold), outlier cards (yellow), witness cards (purple) — alongside existing hypothesis, evidence, contradiction cards. - /api/jobs/[id] hydrates witnesses (JOIN entities for canonical_name) + gaps (with scope JSONB). - /c/[slug] page reads /data/ufo/case/reports/<slug>.md and renders with MarkdownBody, frontmatter parsed for stat chips. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 01:11:39 +00:00
}
```
Constraints:
W4.3: Poirot direct-testimony floor — no defamatory verdicts on thin data Live failure surfaced by user feedback: Poirot wrote a low-credibility verdict on J. Edgar Hoover (W-0002) based on 1 actual chunk and 11 entity_mentions false positives where 'DIRECTOR'/'DIRETOR' was linked to him by mistake. Poirot's own bias_notes correctly identified this — yet still produced a verdict. Published on a 'Disclosure Bureau' site, that's libellously misleading. Deleted W-0001 (Donald Keyhoe) and W-0002 (J. Edgar Hoover) from public.witnesses + their .md files. Prompt rewrite (prompts/poirot.md): - New "What counts as testimony" section up front, before discipline. Direct testimony = the person AUTHORED, was QUOTED verbatim with attribution, or GAVE testimony in a recorded hearing. Not: third- party mentions, generic title appearances ('Director'/'Diretor' that entity-extraction speculatively linked), CC lines. - HARD FLOOR rule: emit `direct_testimony_chunk_ids[]`. If < 3, refuse with INSUFFICIENT_TESTIMONY. For famous historical figures (Wikipedia-worthy public figures) the floor is 5. - Bias claims MUST cite a specific chunk; ungrounded bias claims drop. - Tone: "careful prosecutor preparing a brief, not debunker scoring points." Defense in depth (poirot.ts): - Detective enforces the same floor before calling writeWitnessAnalysis, using a FAMOUS slug list (j-edgar-hoover, donald-keyhoe, j-allen- hynek, curtis-lemay, vannevar-bush, eisenhower, truman, kennedy, ted-bloecher, ...). - When the floor isn't met, emit `poirot_refused_floor` audit event + skip with reason like `insufficient_direct_testimony_1_of_5`. - Sentinel parser now also catches INSUFFICIENT_TESTIMONY when it appears on the first line of an otherwise-prose response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 16:32:46 +00:00
- `direct_testimony_chunk_ids` is the gating field. Below the floor (3
generally, 5 for famous figures), you do NOT emit this object. You
emit `INSUFFICIENT_TESTIMONY` and nothing else.
W4: bilingual EN + PT-BR Investigation Bureau (CLAUDE.md §3 contract) User flagged that the bureau was emitting English-only output, violating the project's bilingual rule. Every narrative field now ships in both languages: stored in sibling DB columns + rendered as adjacent markdown sections per CLAUDE.md §3. Migration 0007 (apply as supabase_admin): - public.hypotheses +question_pt_br, +position_pt_br, +argument_for_pt_br, +argument_against_pt_br - public.contradictions +topic_pt_br, +notes_pt_br - public.witnesses +access_to_event_pt_br, +bias_notes_pt_br, +verdict_pt_br - public.gaps +description_pt_br, +suggested_next_move_pt_br - public.evidence: unchanged (verbatim_excerpt stays source-language) - JSONB siblings inside contradictions.chunks + gaps.scope handled at runtime (statement_pt_br, title_pt_br, dominant_model_pt_br, why_surprising_pt_br, what_it_implies_pt_br). Detective prompts (all 7) rewritten with explicit bilingual JSON contract: - Output protocol section names every EN field + its _pt_br sibling - "Bilingual is mandatory" warning in the task instruction - Sentinel skip-states unchanged (NO_HYPOTHESES, NO_CONTRADICTIONS, INSUFFICIENT_TESTIMONY, INSUFFICIENT_HYPOTHESIS, NO_OUTLIERS, NO_NEW_EVIDENCE, INSUFFICIENT_ARTEFACTS) - Schneier: parallel arrays — hidden_assumptions[i] matches hidden_assumptions_pt_br[i], lengths must match - Case-Writer: interleaved §1 (EN) / §1 (PT-BR) per act in the body Writer-side validation (all 7 tools): - Reject INSERT if PT-BR sibling missing when EN field is set - Persist both languages atomically in one INSERT (no half-updates) - Markdown renderers write adjacent EN+PT-BR sections in case files (## Argument for (EN) followed by ## Argumento a favor (PT-BR), etc.) Detective parse layer (all 7 detectives): - Coerce both keys from JSON output - "incomplete_bilingual_*" skip reason when either side missing - Defensive: PT-BR fields trimmed + length-capped same as EN Orchestrator propagates question_pt_br + topic_pt_br through job payload to runHolmes / runCaseWriter, mirroring the chat-tool entry point. Web (UI): - /api/jobs/[id] hydrates _pt_br siblings from pg - job-status-poller HypothesisCard: PT-BR primary, EN in <details> fallback when both exist - ContradictionCard: PT-BR statement primary + secondary EN quote - WitnessCard: PT-BR verdict primary + secondary EN quote, panels in PT - GapCard: PT-BR title/why/implies primary - /bureau hub: SELECTs both columns, renders PT-BR primary - /h/[id]: ArgumentPanel renders PT-BR primary with collapsible EN fallback when both exist - BureauSnapshot homepage: position_pt_br / topic_pt_br / verdict_pt_br primary - DocBureauPanel /d/[doc]: same primary-PT-BR pattern - New web/lib/i18n/pick.ts helper (unused yet by chat/agents — kept for future locale-driven switching when both languages are equally full; current rule is PT-BR-first since the user is brasileiro) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 15:02:59 +00:00
- `access_to_event` and `bias_notes` ≤ 800 chars each (per language).
- `corroboration_refs` ≤ 8 entries, MUST cite chunk_id values that appear
in the corpus shortlist you were given.
- `verdict` ≤ 280 chars (per language), no hedging language inside the
sentence.
W4.3: Poirot direct-testimony floor — no defamatory verdicts on thin data Live failure surfaced by user feedback: Poirot wrote a low-credibility verdict on J. Edgar Hoover (W-0002) based on 1 actual chunk and 11 entity_mentions false positives where 'DIRECTOR'/'DIRETOR' was linked to him by mistake. Poirot's own bias_notes correctly identified this — yet still produced a verdict. Published on a 'Disclosure Bureau' site, that's libellously misleading. Deleted W-0001 (Donald Keyhoe) and W-0002 (J. Edgar Hoover) from public.witnesses + their .md files. Prompt rewrite (prompts/poirot.md): - New "What counts as testimony" section up front, before discipline. Direct testimony = the person AUTHORED, was QUOTED verbatim with attribution, or GAVE testimony in a recorded hearing. Not: third- party mentions, generic title appearances ('Director'/'Diretor' that entity-extraction speculatively linked), CC lines. - HARD FLOOR rule: emit `direct_testimony_chunk_ids[]`. If < 3, refuse with INSUFFICIENT_TESTIMONY. For famous historical figures (Wikipedia-worthy public figures) the floor is 5. - Bias claims MUST cite a specific chunk; ungrounded bias claims drop. - Tone: "careful prosecutor preparing a brief, not debunker scoring points." Defense in depth (poirot.ts): - Detective enforces the same floor before calling writeWitnessAnalysis, using a FAMOUS slug list (j-edgar-hoover, donald-keyhoe, j-allen- hynek, curtis-lemay, vannevar-bush, eisenhower, truman, kennedy, ted-bloecher, ...). - When the floor isn't met, emit `poirot_refused_floor` audit event + skip with reason like `insufficient_direct_testimony_1_of_5`. - Sentinel parser now also catches INSUFFICIENT_TESTIMONY when it appears on the first line of an otherwise-prose response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 16:32:46 +00:00
- A missing `*_pt_br` sibling is a hard validation failure.
## Tone
W3.8: Investigation Bureau complete — Poirot, Taleb, Tetlock, Case-Writer Brings the bureau from 4 → 8 detectives. All eight run as Bun + claude-CLI subprocesses against the same Supabase + investigation_jobs LISTEN/NOTIFY queue, sharing search.ts hybridSearch and writer-side validators that gate writes against schema + FK. New detectives: Poirot (witness_analysis) - prompts/poirot.md — credibility / access / bias / corroboration / verdict; uses entity_mentions JOIN chunks to pull 12 chunks per person; resolves corroboration_refs chunk_ids defensively (accepts bare cNNNN even when the model emits pNNN/cNNNN). - INSERT into public.witnesses with W-NNNN naming. - Tone: purple (#9b5de5). Taleb (outlier_scan) - prompts/taleb.md — "surprise is relative to a model"; at most 3 outliers; each requires explicit dominant_model + why_surprising + what_it_implies; fan-out into public.gaps with scope.kind="outlier". - Same unscoped-fallback as Dupin (Pass 1 with doc_id, Pass 2 widens to corpus if hits < 3). - Tone: yellow (#ffd23f). Tetlock (calibrate_hypothesis) - prompts/tetlock.md — honest Bayesian update; emits new_posterior + Δ + recommended_action ∈ {keep, downgrade, upgrade, supersede}. - write_calibration UPDATEs public.hypotheses + APPENDS a "## Calibration history" section to the H-NNNN.md case file (calibration is append-only — each datapoint matters). Posterior band auto-corrected to match Tetlock thresholds. - NO_NEW_EVIDENCE sentinel handled; pure 'keep' with |Δ|<0.005 only touches updated_at + reviewed_by. - Tone: teal (#26d4cc). Case-Writer (case_report) - prompts/case-writer.md — Dr. Watson assembles all artefacts (E-NNNN, H-NNNN, R-NNNN, W-NNNN, G-NNNN) into a five-act narrative. ILIKE filter on topic; doc_id optional scope. - Larger budget cap (≥ $0.50) + longer timeout for prose generation. - Writes case/reports/<slug>.md with frontmatter (topic + counts); no DB table for v0. - New page /c/[slug] renders the report via MarkdownBody + stat chips. - Tone: gold (#e0c080). Hardening across the bureau: - Sentinel parsing now accepts backticked AND prose-trailing forms (Holmes NO_HYPOTHESES, Dupin NO_CONTRADICTIONS, Schneier INSUFFICIENT_HYPOTHESIS, Poirot INSUFFICIENT_TESTIMONY, Taleb NO_OUTLIERS, Tetlock NO_NEW_EVIDENCE, Case-Writer INSUFFICIENT_ARTEFACTS). Avoids the failure mode where the model refuses honestly but the runtime treated it as a parse error (observed live with Poirot+Hoover identifying the DIRECTOR false-positive disambiguation issue in entity_mentions). Chat tool extensions (web/lib/chat/tools.ts): - request_investigation now accepts 7 kinds. Each routes to its detective with appropriate validation (hypothesis_id regex, person_id kebab-case, topic non-empty, doc_id for evidence_chain). - ETA per kind: Holmes/Dupin 60s, Poirot 45s, Schneier/Tetlock 30s, Taleb 50s, Case-Writer 180s (longer prose), Locard 30×n_chunks. UI integration: - chat-bubble inline card paints each detective in its tone color. - /jobs/[id] page header swaps name/subtitle/tone per detective; question label adapts ("Topic" / "Hypothesis under attack" / "Witness under analysis" / "Topic to outlier-scan" / "Hypothesis under recalibration" / "Case to assemble"). - job-status-poller renders: case-report link card (gold), outlier cards (yellow), witness cards (purple) — alongside existing hypothesis, evidence, contradiction cards. - /api/jobs/[id] hydrates witnesses (JOIN entities for canonical_name) + gaps (with scope JSONB). - /c/[slug] page reads /data/ufo/case/reports/<slug>.md and renders with MarkdownBody, frontmatter parsed for stat chips. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 01:11:39 +00:00
W4.3: Poirot direct-testimony floor — no defamatory verdicts on thin data Live failure surfaced by user feedback: Poirot wrote a low-credibility verdict on J. Edgar Hoover (W-0002) based on 1 actual chunk and 11 entity_mentions false positives where 'DIRECTOR'/'DIRETOR' was linked to him by mistake. Poirot's own bias_notes correctly identified this — yet still produced a verdict. Published on a 'Disclosure Bureau' site, that's libellously misleading. Deleted W-0001 (Donald Keyhoe) and W-0002 (J. Edgar Hoover) from public.witnesses + their .md files. Prompt rewrite (prompts/poirot.md): - New "What counts as testimony" section up front, before discipline. Direct testimony = the person AUTHORED, was QUOTED verbatim with attribution, or GAVE testimony in a recorded hearing. Not: third- party mentions, generic title appearances ('Director'/'Diretor' that entity-extraction speculatively linked), CC lines. - HARD FLOOR rule: emit `direct_testimony_chunk_ids[]`. If < 3, refuse with INSUFFICIENT_TESTIMONY. For famous historical figures (Wikipedia-worthy public figures) the floor is 5. - Bias claims MUST cite a specific chunk; ungrounded bias claims drop. - Tone: "careful prosecutor preparing a brief, not debunker scoring points." Defense in depth (poirot.ts): - Detective enforces the same floor before calling writeWitnessAnalysis, using a FAMOUS slug list (j-edgar-hoover, donald-keyhoe, j-allen- hynek, curtis-lemay, vannevar-bush, eisenhower, truman, kennedy, ted-bloecher, ...). - When the floor isn't met, emit `poirot_refused_floor` audit event + skip with reason like `insufficient_direct_testimony_1_of_5`. - Sentinel parser now also catches INSUFFICIENT_TESTIMONY when it appears on the first line of an otherwise-prose response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 16:32:46 +00:00
Witness analysis published on a public investigative wiki carries
reputational weight. Write as a careful prosecutor preparing a brief, not
as a debunker scoring points. State what the corpus shows; do not
extrapolate to character or motive that the corpus does not document.