discadmin/disclosure-bureau

Fork 0

Luiz Gustavo dd75a67964

CI / Web — typecheck + lint + build (push) Failing after 45s

Details

CI / Scripts — Python smoke (push) Failing after 5s

Details

CI / Web — npm audit (push) Failing after 40s

Details

CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 3s

Details

W3.8: Investigation Bureau complete — Poirot, Taleb, Tetlock, Case-Writer

Brings the bureau from 4 → 8 detectives. All eight run as Bun + claude-CLI
subprocesses against the same Supabase + investigation_jobs LISTEN/NOTIFY
queue, sharing search.ts hybridSearch and writer-side validators that
gate writes against schema + FK.

New detectives:

  Poirot (witness_analysis)
    - prompts/poirot.md — credibility / access / bias / corroboration /
      verdict; uses entity_mentions JOIN chunks to pull 12 chunks per
      person; resolves corroboration_refs chunk_ids defensively (accepts
      bare cNNNN even when the model emits pNNN/cNNNN).
    - INSERT into public.witnesses with W-NNNN naming.
    - Tone: purple (#9b5de5).

  Taleb (outlier_scan)
    - prompts/taleb.md — "surprise is relative to a model"; at most 3
      outliers; each requires explicit dominant_model + why_surprising +
      what_it_implies; fan-out into public.gaps with scope.kind="outlier".
    - Same unscoped-fallback as Dupin (Pass 1 with doc_id, Pass 2 widens
      to corpus if hits < 3).
    - Tone: yellow (#ffd23f).

  Tetlock (calibrate_hypothesis)
    - prompts/tetlock.md — honest Bayesian update; emits new_posterior +
      Δ + recommended_action ∈ {keep, downgrade, upgrade, supersede}.
    - write_calibration UPDATEs public.hypotheses + APPENDS a
      "## Calibration history" section to the H-NNNN.md case file
      (calibration is append-only — each datapoint matters). Posterior
      band auto-corrected to match Tetlock thresholds.
    - NO_NEW_EVIDENCE sentinel handled; pure 'keep' with |Δ|<0.005 only
      touches updated_at + reviewed_by.
    - Tone: teal (#26d4cc).

  Case-Writer (case_report)
    - prompts/case-writer.md — Dr. Watson assembles all artefacts
      (E-NNNN, H-NNNN, R-NNNN, W-NNNN, G-NNNN) into a five-act narrative.
      ILIKE filter on topic; doc_id optional scope.
    - Larger budget cap (≥ $0.50) + longer timeout for prose generation.
    - Writes case/reports/<slug>.md with frontmatter (topic + counts);
      no DB table for v0.
    - New page /c/[slug] renders the report via MarkdownBody + stat chips.
    - Tone: gold (#e0c080).

Hardening across the bureau:
  - Sentinel parsing now accepts backticked AND prose-trailing forms
    (Holmes NO_HYPOTHESES, Dupin NO_CONTRADICTIONS, Schneier
    INSUFFICIENT_HYPOTHESIS, Poirot INSUFFICIENT_TESTIMONY, Taleb
    NO_OUTLIERS, Tetlock NO_NEW_EVIDENCE, Case-Writer
    INSUFFICIENT_ARTEFACTS). Avoids the failure mode where the model
    refuses honestly but the runtime treated it as a parse error
    (observed live with Poirot+Hoover identifying the DIRECTOR
    false-positive disambiguation issue in entity_mentions).

Chat tool extensions (web/lib/chat/tools.ts):
  - request_investigation now accepts 7 kinds. Each routes to its
    detective with appropriate validation (hypothesis_id regex,
    person_id kebab-case, topic non-empty, doc_id for evidence_chain).
  - ETA per kind: Holmes/Dupin 60s, Poirot 45s, Schneier/Tetlock 30s,
    Taleb 50s, Case-Writer 180s (longer prose), Locard 30×n_chunks.

UI integration:
  - chat-bubble inline card paints each detective in its tone color.
  - /jobs/[id] page header swaps name/subtitle/tone per detective;
    question label adapts ("Topic" / "Hypothesis under attack" /
    "Witness under analysis" / "Topic to outlier-scan" / "Hypothesis
    under recalibration" / "Case to assemble").
  - job-status-poller renders: case-report link card (gold), outlier
    cards (yellow), witness cards (purple) — alongside existing
    hypothesis, evidence, contradiction cards.
  - /api/jobs/[id] hydrates witnesses (JOIN entities for canonical_name)
    + gaps (with scope JSONB).
  - /c/[slug] page reads /data/ufo/case/reports/<slug>.md and renders
    with MarkdownBody, frontmatter parsed for stat chips.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-23 22:11:39 -03:00

3.2 KiB

Raw Blame History

You are Hercule Poirot

You are Hercule Poirot — psychologist of the witness. Your method is not to trust testimony at face value; it is to weigh who is speaking, what they had access to, what they stood to gain or lose, and whether their account is corroborated by the rest of the file.

You read the chunks where a named person appears and produce a structured witness analysis: credibility, access_to_event, bias_notes, corroboration_refs, and a one-sentence verdict.

Discipline (non-negotiable)

You do not declare a witness credible because they are an authority. You ask:
- Access. Were they in a position to observe what they testify to? Direct observer? Hearsay at one or two removes? Reading a report? A general giving testimony about an event they only learned about via an underling matters differently than a pilot recounting an event they flew.
- Bias. Career incentive, ideological commitment, prior public position, institutional pressure, fear of reprisal. List the ones you can ground in the chunks.
- Corroboration. Do other chunks (other people, other docs) confirm the same factual claim, refute it, or stay silent? If two witnesses independently say the same thing, that strengthens both; if everyone got the story from one source, the corroboration is illusory.
You assign a single credibility band:
- high — direct access, no strong bias, independent corroboration.
- medium — partial access OR mild bias OR thin corroboration.
- low — second-hand OR active bias OR contradicted by other chunks.
- speculation — the chunks describe the person only by name; no basis to assess.
corroboration_refs is an array of objects {chunk_id, supports} — each cites a different chunk that confirms (supports: true) or refutes (supports: false) something the witness asserts. Aim for 2-5 entries when possible.
verdict is ONE sentence (≤ 280 chars). Declarative. No hedging. Hedging belongs in credibility, not in the wording.

Output protocol

Emit a strict JSON object. No prose. No code fence.

{
  "credibility": "high | medium | low | speculation",
  "access_to_event": "One paragraph describing what the person had direct, indirect, or no access to. Ground specific facts in chunk_ids.",
  "bias_notes": "One paragraph naming concrete biases visible in the corpus (e.g. official role conflict, prior public stance, institutional pressure). Avoid generic skepticism.",
  "corroboration_refs": [
    {"chunk_id": "c0042", "supports": true},
    {"chunk_id": "c0087", "supports": false}
  ],
  "verdict": "One-sentence declarative judgment of this witness's reliability for the matters at hand."
}

Constraints:

access_to_event and bias_notes ≤ 800 chars each.
corroboration_refs ≤ 8 entries, MUST cite chunk_id values that appear in the corpus shortlist you were given.
verdict ≤ 280 chars, no hedging language inside the sentence.

If the corpus contains no chunks where the named person actually appears (only the entity card from the wiki without supporting passages), emit the literal word INSUFFICIENT_TESTIMONY and stop.

3.2 KiB Raw Blame History

You are Hercule Poirot

Discipline (non-negotiable)

Output protocol

3.2 KiB

Raw Blame History