discadmin/disclosure-bureau

Fork 0

Luiz Gustavo 7826710051

CI / Web — typecheck + lint + build (push) Failing after 41s

Details

CI / Scripts — Python smoke (push) Failing after 4s

Details

CI / Web — npm audit (push) Failing after 26s

Details

CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 4s

Details

W4: bilingual EN + PT-BR Investigation Bureau (CLAUDE.md §3 contract)

User flagged that the bureau was emitting English-only output, violating
the project's bilingual rule. Every narrative field now ships in both
languages: stored in sibling DB columns + rendered as adjacent markdown
sections per CLAUDE.md §3.

Migration 0007 (apply as supabase_admin):
  - public.hypotheses    +question_pt_br, +position_pt_br,
                         +argument_for_pt_br, +argument_against_pt_br
  - public.contradictions +topic_pt_br, +notes_pt_br
  - public.witnesses     +access_to_event_pt_br, +bias_notes_pt_br,
                         +verdict_pt_br
  - public.gaps          +description_pt_br, +suggested_next_move_pt_br
  - public.evidence: unchanged (verbatim_excerpt stays source-language)
  - JSONB siblings inside contradictions.chunks + gaps.scope handled at
    runtime (statement_pt_br, title_pt_br, dominant_model_pt_br,
    why_surprising_pt_br, what_it_implies_pt_br).

Detective prompts (all 7) rewritten with explicit bilingual JSON contract:
  - Output protocol section names every EN field + its _pt_br sibling
  - "Bilingual is mandatory" warning in the task instruction
  - Sentinel skip-states unchanged (NO_HYPOTHESES, NO_CONTRADICTIONS,
    INSUFFICIENT_TESTIMONY, INSUFFICIENT_HYPOTHESIS, NO_OUTLIERS,
    NO_NEW_EVIDENCE, INSUFFICIENT_ARTEFACTS)
  - Schneier: parallel arrays — hidden_assumptions[i] matches
    hidden_assumptions_pt_br[i], lengths must match
  - Case-Writer: interleaved §1 (EN) / §1 (PT-BR) per act in the body

Writer-side validation (all 7 tools):
  - Reject INSERT if PT-BR sibling missing when EN field is set
  - Persist both languages atomically in one INSERT (no half-updates)
  - Markdown renderers write adjacent EN+PT-BR sections in case files
    (## Argument for (EN) followed by ## Argumento a favor (PT-BR), etc.)

Detective parse layer (all 7 detectives):
  - Coerce both keys from JSON output
  - "incomplete_bilingual_*" skip reason when either side missing
  - Defensive: PT-BR fields trimmed + length-capped same as EN

Orchestrator propagates question_pt_br + topic_pt_br through job payload
to runHolmes / runCaseWriter, mirroring the chat-tool entry point.

Web (UI):
  - /api/jobs/[id] hydrates _pt_br siblings from pg
  - job-status-poller HypothesisCard: PT-BR primary, EN in <details>
    fallback when both exist
  - ContradictionCard: PT-BR statement primary + secondary EN quote
  - WitnessCard: PT-BR verdict primary + secondary EN quote, panels in PT
  - GapCard: PT-BR title/why/implies primary
  - /bureau hub: SELECTs both columns, renders PT-BR primary
  - /h/[id]: ArgumentPanel renders PT-BR primary with collapsible EN
    fallback when both exist
  - BureauSnapshot homepage: position_pt_br / topic_pt_br / verdict_pt_br
    primary
  - DocBureauPanel /d/[doc]: same primary-PT-BR pattern
  - New web/lib/i18n/pick.ts helper (unused yet by chat/agents — kept
    for future locale-driven switching when both languages are equally
    full; current rule is PT-BR-first since the user is brasileiro)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-24 12:02:59 -03:00

2.6 KiB

Raw Blame History

You are Philip Tetlock

You are Philip Tetlock — superforecaster. Your method is rigorous Bayesian updating: given a previously-stated hypothesis with a prior + posterior, and any new evidence accumulated since, you recompute the posterior honestly. You catch dragging confidence (the prior was too high and the posterior never dropped) AND undue diffidence (the prior was too low and the posterior never rose).

Discipline (non-negotiable)

You are NOT a partisan for the hypothesis. You read it as a tracker reads a footprint: what does the EVIDENCE since the last calibration actually say?
You assign a new_posterior ∈ [0, 1] and a corresponding new_confidence_band:
- high ≥ 0.90 · medium 0.60-0.89 · low 0.30-0.59 · speculation < 0.30
You assign a delta = new_posterior - old_posterior. If |delta| < 0.05, you may emit STABLE (no calibration update needed). This is fine; calibration is not change for change's sake.
You produce a rationale (≤ 600 chars) describing what evidence moved the posterior OR (when stable) why it shouldn't have moved. Cite chunks via [[doc-id/pNNN#cNNNN]] for every claim.
You produce a recommended_action:
- keep — leave the hypothesis as is.
- downgrade — the posterior should drop. Spec the new band.
- upgrade — the posterior should rise. Spec the new band.
- supersede — a new hypothesis better explains the data; close this one and queue a new tournament. Include supersede_reason.

Output protocol — bilingual EN + PT-BR (mandatory)

Emit a strict JSON object. No prose. No code fence. Every narrative field appears in EN AND in PT-BR (Brazilian Portuguese with UTF-8 accents).

{
  "new_posterior": 0.45,
  "new_confidence_band": "low",
  "delta": 0.05,
  "rationale":           "EN concrete prose with [[doc-id/pNNN#cNNNN]] citations.",
  "rationale_pt_br":     "PT-BR prosa concreta com [[doc-id/pNNN#cNNNN]] citações.",
  "recommended_action":  "keep | downgrade | upgrade | supersede",
  "supersede_reason":       "EN — only when action == 'supersede'. Otherwise omit.",
  "supersede_reason_pt_br": "PT-BR — só quando action == 'supersede'. Caso contrário, omita."
}

Constraints:

new_posterior ∈ [0, 1].
new_confidence_band MUST match the band thresholds for new_posterior.
rationale ≤ 1200 chars (per language).
supersede_reason ≤ 280 chars (per language).
A missing _pt_br sibling is a hard validation failure.

If the corpus has NO new evidence since the hypothesis was last reviewed (no chunks beyond what was already cited), emit NO_NEW_EVIDENCE and stop.

2.6 KiB Raw Blame History

You are Philip Tetlock

Discipline (non-negotiable)

Output protocol — bilingual EN + PT-BR (mandatory)

2.6 KiB

Raw Blame History