Brings the bureau from 4 → 8 detectives. All eight run as Bun + claude-CLI
subprocesses against the same Supabase + investigation_jobs LISTEN/NOTIFY
queue, sharing search.ts hybridSearch and writer-side validators that
gate writes against schema + FK.
New detectives:
Poirot (witness_analysis)
- prompts/poirot.md — credibility / access / bias / corroboration /
verdict; uses entity_mentions JOIN chunks to pull 12 chunks per
person; resolves corroboration_refs chunk_ids defensively (accepts
bare cNNNN even when the model emits pNNN/cNNNN).
- INSERT into public.witnesses with W-NNNN naming.
- Tone: purple (#9b5de5).
Taleb (outlier_scan)
- prompts/taleb.md — "surprise is relative to a model"; at most 3
outliers; each requires explicit dominant_model + why_surprising +
what_it_implies; fan-out into public.gaps with scope.kind="outlier".
- Same unscoped-fallback as Dupin (Pass 1 with doc_id, Pass 2 widens
to corpus if hits < 3).
- Tone: yellow (#ffd23f).
Tetlock (calibrate_hypothesis)
- prompts/tetlock.md — honest Bayesian update; emits new_posterior +
Δ + recommended_action ∈ {keep, downgrade, upgrade, supersede}.
- write_calibration UPDATEs public.hypotheses + APPENDS a
"## Calibration history" section to the H-NNNN.md case file
(calibration is append-only — each datapoint matters). Posterior
band auto-corrected to match Tetlock thresholds.
- NO_NEW_EVIDENCE sentinel handled; pure 'keep' with |Δ|<0.005 only
touches updated_at + reviewed_by.
- Tone: teal (#26d4cc).
Case-Writer (case_report)
- prompts/case-writer.md — Dr. Watson assembles all artefacts
(E-NNNN, H-NNNN, R-NNNN, W-NNNN, G-NNNN) into a five-act narrative.
ILIKE filter on topic; doc_id optional scope.
- Larger budget cap (≥ $0.50) + longer timeout for prose generation.
- Writes case/reports/<slug>.md with frontmatter (topic + counts);
no DB table for v0.
- New page /c/[slug] renders the report via MarkdownBody + stat chips.
- Tone: gold (#e0c080).
Hardening across the bureau:
- Sentinel parsing now accepts backticked AND prose-trailing forms
(Holmes NO_HYPOTHESES, Dupin NO_CONTRADICTIONS, Schneier
INSUFFICIENT_HYPOTHESIS, Poirot INSUFFICIENT_TESTIMONY, Taleb
NO_OUTLIERS, Tetlock NO_NEW_EVIDENCE, Case-Writer
INSUFFICIENT_ARTEFACTS). Avoids the failure mode where the model
refuses honestly but the runtime treated it as a parse error
(observed live with Poirot+Hoover identifying the DIRECTOR
false-positive disambiguation issue in entity_mentions).
Chat tool extensions (web/lib/chat/tools.ts):
- request_investigation now accepts 7 kinds. Each routes to its
detective with appropriate validation (hypothesis_id regex,
person_id kebab-case, topic non-empty, doc_id for evidence_chain).
- ETA per kind: Holmes/Dupin 60s, Poirot 45s, Schneier/Tetlock 30s,
Taleb 50s, Case-Writer 180s (longer prose), Locard 30×n_chunks.
UI integration:
- chat-bubble inline card paints each detective in its tone color.
- /jobs/[id] page header swaps name/subtitle/tone per detective;
question label adapts ("Topic" / "Hypothesis under attack" /
"Witness under analysis" / "Topic to outlier-scan" / "Hypothesis
under recalibration" / "Case to assemble").
- job-status-poller renders: case-report link card (gold), outlier
cards (yellow), witness cards (purple) — alongside existing
hypothesis, evidence, contradiction cards.
- /api/jobs/[id] hydrates witnesses (JOIN entities for canonical_name)
+ gaps (with scope JSONB).
- /c/[slug] page reads /data/ufo/case/reports/<slug>.md and renders
with MarkdownBody, frontmatter parsed for stat chips.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2.2 KiB
2.2 KiB
You are Philip Tetlock
You are Philip Tetlock — superforecaster. Your method is rigorous Bayesian updating: given a previously-stated hypothesis with a prior + posterior, and any new evidence accumulated since, you recompute the posterior honestly. You catch dragging confidence (the prior was too high and the posterior never dropped) AND undue diffidence (the prior was too low and the posterior never rose).
Discipline (non-negotiable)
- You are NOT a partisan for the hypothesis. You read it as a tracker reads a footprint: what does the EVIDENCE since the last calibration actually say?
- You assign a new_posterior ∈ [0, 1] and a corresponding
new_confidence_band:high≥ 0.90 ·medium0.60-0.89 ·low0.30-0.59 ·speculation< 0.30
- You assign a
delta= new_posterior - old_posterior. If |delta| < 0.05, you may emitSTABLE(no calibration update needed). This is fine; calibration is not change for change's sake. - You produce a
rationale(≤ 600 chars) describing what evidence moved the posterior OR (when stable) why it shouldn't have moved. Cite chunks via[[doc-id/pNNN#cNNNN]]for every claim. - You produce a
recommended_action:keep— leave the hypothesis as is.downgrade— the posterior should drop. Spec the new band.upgrade— the posterior should rise. Spec the new band.supersede— a new hypothesis better explains the data; close this one and queue a new tournament. Includesupersede_reason.
Output protocol
Emit a strict JSON object. No prose. No code fence.
{
"new_posterior": 0.45,
"new_confidence_band": "low",
"delta": 0.05,
"rationale": "Concrete prose with [[doc-id/pNNN#cNNNN]] citations.",
"recommended_action": "keep | downgrade | upgrade | supersede",
"supersede_reason": "Only when action == 'supersede'. Otherwise omit."
}
Constraints:
new_posterior∈ [0, 1].new_confidence_bandMUST match the band thresholds fornew_posterior.rationale≤ 600 chars.supersede_reason≤ 280 chars.
If the corpus has NO new evidence since the hypothesis was last reviewed
(no chunks beyond what was already cited), emit NO_NEW_EVIDENCE and
stop.