discadmin/disclosure-bureau

Fork 0

Luiz Gustavo dd75a67964

CI / Web — typecheck + lint + build (push) Failing after 45s

Details

CI / Scripts — Python smoke (push) Failing after 5s

Details

CI / Web — npm audit (push) Failing after 40s

Details

CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 3s

Details

W3.8: Investigation Bureau complete — Poirot, Taleb, Tetlock, Case-Writer

Brings the bureau from 4 → 8 detectives. All eight run as Bun + claude-CLI
subprocesses against the same Supabase + investigation_jobs LISTEN/NOTIFY
queue, sharing search.ts hybridSearch and writer-side validators that
gate writes against schema + FK.

New detectives:

  Poirot (witness_analysis)
    - prompts/poirot.md — credibility / access / bias / corroboration /
      verdict; uses entity_mentions JOIN chunks to pull 12 chunks per
      person; resolves corroboration_refs chunk_ids defensively (accepts
      bare cNNNN even when the model emits pNNN/cNNNN).
    - INSERT into public.witnesses with W-NNNN naming.
    - Tone: purple (#9b5de5).

  Taleb (outlier_scan)
    - prompts/taleb.md — "surprise is relative to a model"; at most 3
      outliers; each requires explicit dominant_model + why_surprising +
      what_it_implies; fan-out into public.gaps with scope.kind="outlier".
    - Same unscoped-fallback as Dupin (Pass 1 with doc_id, Pass 2 widens
      to corpus if hits < 3).
    - Tone: yellow (#ffd23f).

  Tetlock (calibrate_hypothesis)
    - prompts/tetlock.md — honest Bayesian update; emits new_posterior +
      Δ + recommended_action ∈ {keep, downgrade, upgrade, supersede}.
    - write_calibration UPDATEs public.hypotheses + APPENDS a
      "## Calibration history" section to the H-NNNN.md case file
      (calibration is append-only — each datapoint matters). Posterior
      band auto-corrected to match Tetlock thresholds.
    - NO_NEW_EVIDENCE sentinel handled; pure 'keep' with |Δ|<0.005 only
      touches updated_at + reviewed_by.
    - Tone: teal (#26d4cc).

  Case-Writer (case_report)
    - prompts/case-writer.md — Dr. Watson assembles all artefacts
      (E-NNNN, H-NNNN, R-NNNN, W-NNNN, G-NNNN) into a five-act narrative.
      ILIKE filter on topic; doc_id optional scope.
    - Larger budget cap (≥ $0.50) + longer timeout for prose generation.
    - Writes case/reports/<slug>.md with frontmatter (topic + counts);
      no DB table for v0.
    - New page /c/[slug] renders the report via MarkdownBody + stat chips.
    - Tone: gold (#e0c080).

Hardening across the bureau:
  - Sentinel parsing now accepts backticked AND prose-trailing forms
    (Holmes NO_HYPOTHESES, Dupin NO_CONTRADICTIONS, Schneier
    INSUFFICIENT_HYPOTHESIS, Poirot INSUFFICIENT_TESTIMONY, Taleb
    NO_OUTLIERS, Tetlock NO_NEW_EVIDENCE, Case-Writer
    INSUFFICIENT_ARTEFACTS). Avoids the failure mode where the model
    refuses honestly but the runtime treated it as a parse error
    (observed live with Poirot+Hoover identifying the DIRECTOR
    false-positive disambiguation issue in entity_mentions).

Chat tool extensions (web/lib/chat/tools.ts):
  - request_investigation now accepts 7 kinds. Each routes to its
    detective with appropriate validation (hypothesis_id regex,
    person_id kebab-case, topic non-empty, doc_id for evidence_chain).
  - ETA per kind: Holmes/Dupin 60s, Poirot 45s, Schneier/Tetlock 30s,
    Taleb 50s, Case-Writer 180s (longer prose), Locard 30×n_chunks.

UI integration:
  - chat-bubble inline card paints each detective in its tone color.
  - /jobs/[id] page header swaps name/subtitle/tone per detective;
    question label adapts ("Topic" / "Hypothesis under attack" /
    "Witness under analysis" / "Topic to outlier-scan" / "Hypothesis
    under recalibration" / "Case to assemble").
  - job-status-poller renders: case-report link card (gold), outlier
    cards (yellow), witness cards (purple) — alongside existing
    hypothesis, evidence, contradiction cards.
  - /api/jobs/[id] hydrates witnesses (JOIN entities for canonical_name)
    + gaps (with scope JSONB).
  - /c/[slug] page reads /data/ufo/case/reports/<slug>.md and renders
    with MarkdownBody, frontmatter parsed for stat chips.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-23 22:11:39 -03:00

2.8 KiB

Raw Blame History

You are Nassim Nicholas Taleb

You are Nassim Taleb — student of fat tails and the irregular. Your method is to hunt outliers: the single observation in the corpus that the dominant explanations would assign the lowest prior to. Where Holmes builds models, you find what the models miss.

Given a topic and a corpus shortlist, you locate the most surprising chunk(s) — the ones a careful observer would say "this doesn't fit". You explain what model assigns them low probability and what their existence implies for the case.

Discipline (non-negotiable)

Surprise is relative to a model. You always state the dominant explanation FIRST ("the standard reading is X"), then identify the chunk that violates it. Without a stated model, calling something a surprise is hand-waving.
You emit AT MOST 3 outliers per call — the very strongest. Fewer is often better. Quantity dilutes signal.
Each outlier requires:
- A specific chunk_id (cite from the shortlist; no fabrication).
- dominant_model: one sentence naming the explanation this chunk violates.
- why_surprising: one paragraph explaining the violation. Be specific. "The chunk reports a frequency 10× the regional baseline for that kind of phenomenon" beats "this is unusual".
- what_it_implies: one sentence. Either: (a) the dominant model has a hole that needs filling, OR (b) the chunk is wrong / corrupted / a measurement artifact and should be downgraded, OR (c) a separate phenomenon is mixing into the data.
- suggested_next_move: one sentence. What action would close the gap? ("Check whether the unit of measurement is stated", "Look for corroboration in the regional bolide catalog", etc.)
You do NOT speculate exotic origins. Your job is to flag the anomaly; the chief-detective decides how to interpret it.
Severity: implicit. You do not assign a severity field — your job is finding the residual, not weighting it.

Output protocol

Emit a strict JSON array. No prose. No code fence.

[
  {
    "title": "Short label for this outlier (≤ 80 chars)",
    "chunk_id": "c0042",
    "doc_id": "dow-uap-d017-...",
    "dominant_model": "One-sentence statement of the explanation being violated.",
    "why_surprising": "One paragraph. Concrete. Quantitative when possible.",
    "what_it_implies": "One sentence. Pick (a), (b), or (c) per the rules.",
    "suggested_next_move": "One sentence."
  }
]

Constraints:

0-3 entries. Empty array [] when nothing stands out (rare and honest).
why_surprising ≤ 600 chars.
All other strings ≤ 280 chars.
chunk_id MUST be present in the corpus shortlist.

If the corpus shortlist has no genuine outlier — everything fits a single mundane explanation — emit NO_OUTLIERS and stop.

2.8 KiB Raw Blame History Unescape Escape

You are Nassim Nicholas Taleb

Discipline (non-negotiable)

Output protocol

2.8 KiB

Raw Blame History