disclosure-bureau

discadmin/disclosure-bureau

Fork 0

Commit graph

Author	SHA1	Message	Date
Luiz Gustavo	b3a6a3c1a3	W5.2: best-seller case-writer — single voice, scene-driven, anti-skeptic Some checks failed CI / Web — typecheck + lint + build (push) Failing after 38s Details CI / Scripts — Python smoke (push) Failing after 3s Details CI / Web — npm audit (push) Failing after 27s Details CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 3s Details User: "shouldn't mention the names of the mind-clones, should merge all analyses and write like a best-seller author would, about what happened." Voice rewrite (prompts/case-writer.md): - Reference voices: Erik Larson, Sam Kean, John McPhee, Mark Bowden. Plainspoken non-fiction, scene-driven, fascinated. - One narrator. NEVER say "Sherlock Holmes argues" / "Sun-Tzu builds the case" / "the team concluded". No internal-process names reach the reader. - Hook the first paragraph. Open in a scene with a date, place, and person doing something specific. NOT "This case investigates..." - Show, don't argue. Verbatim quotes stay source-language in blockquotes; the narration around them is the narrator's voice. - Every claim cites a chunk with [[doc-id/pNNN#cNNNN]]. - Forbidden ceremony: "In summary…", "Em suma…", "Ultimately…", "It is worth noting…", detective names, probability tables, hypothesis tournaments. - The honest unknown is the subject, not a failure: "Whatever was in the sky over Sandia in December 1948, the government never said." - 4-6 numbered scenes, each title-cased specifically ("The Green Sphere Over Highway 60" not "Background"). - Bilingual EN + PT-BR per CLAUDE.md §3 — sections alternate, no mid-paragraph language mixing. - Refusal: emit INSUFFICIENT_ARTEFACTS rather than padding when the corpus is thin. Raw-material pipeline (src/detectives/case_writer.ts): - hybridSearch(topic, lang, top_k=18) gives the narrator real corpus scenes with verbatim text + chunk_id citations + bbox metadata. This is what was missing — v1 only saw pre-digested hypothesis artefacts, which is how the academic prose got there. - Dropped the hypotheses + contradictions queries from the loader. They were skeptic-framing scaffolding that doesn't belong in the raw material a best-seller narrator works from. - New buildPrompt sections: "Primary-source scenes", "Curated verbatim quotes", "Anomalies and surprises", "Named witnesses". Anomalies (Taleb's outlier gaps) reframed: drop dominant_model skeptic baseline, keep title + why_surprising as gold material. - Refusal floor: < 4 scenes from hybridSearch → skip with reason. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 14:21:53 -03:00
Luiz Gustavo	7826710051	W4: bilingual EN + PT-BR Investigation Bureau (CLAUDE.md §3 contract) Some checks failed CI / Web — typecheck + lint + build (push) Failing after 41s Details CI / Scripts — Python smoke (push) Failing after 4s Details CI / Web — npm audit (push) Failing after 26s Details CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 4s Details User flagged that the bureau was emitting English-only output, violating the project's bilingual rule. Every narrative field now ships in both languages: stored in sibling DB columns + rendered as adjacent markdown sections per CLAUDE.md §3. Migration 0007 (apply as supabase_admin): - public.hypotheses +question_pt_br, +position_pt_br, +argument_for_pt_br, +argument_against_pt_br - public.contradictions +topic_pt_br, +notes_pt_br - public.witnesses +access_to_event_pt_br, +bias_notes_pt_br, +verdict_pt_br - public.gaps +description_pt_br, +suggested_next_move_pt_br - public.evidence: unchanged (verbatim_excerpt stays source-language) - JSONB siblings inside contradictions.chunks + gaps.scope handled at runtime (statement_pt_br, title_pt_br, dominant_model_pt_br, why_surprising_pt_br, what_it_implies_pt_br). Detective prompts (all 7) rewritten with explicit bilingual JSON contract: - Output protocol section names every EN field + its _pt_br sibling - "Bilingual is mandatory" warning in the task instruction - Sentinel skip-states unchanged (NO_HYPOTHESES, NO_CONTRADICTIONS, INSUFFICIENT_TESTIMONY, INSUFFICIENT_HYPOTHESIS, NO_OUTLIERS, NO_NEW_EVIDENCE, INSUFFICIENT_ARTEFACTS) - Schneier: parallel arrays — hidden_assumptions[i] matches hidden_assumptions_pt_br[i], lengths must match - Case-Writer: interleaved §1 (EN) / §1 (PT-BR) per act in the body Writer-side validation (all 7 tools): - Reject INSERT if PT-BR sibling missing when EN field is set - Persist both languages atomically in one INSERT (no half-updates) - Markdown renderers write adjacent EN+PT-BR sections in case files (## Argument for (EN) followed by ## Argumento a favor (PT-BR), etc.) Detective parse layer (all 7 detectives): - Coerce both keys from JSON output - "incomplete_bilingual_*" skip reason when either side missing - Defensive: PT-BR fields trimmed + length-capped same as EN Orchestrator propagates question_pt_br + topic_pt_br through job payload to runHolmes / runCaseWriter, mirroring the chat-tool entry point. Web (UI): - /api/jobs/[id] hydrates _pt_br siblings from pg - job-status-poller HypothesisCard: PT-BR primary, EN in <details> fallback when both exist - ContradictionCard: PT-BR statement primary + secondary EN quote - WitnessCard: PT-BR verdict primary + secondary EN quote, panels in PT - GapCard: PT-BR title/why/implies primary - /bureau hub: SELECTs both columns, renders PT-BR primary - /h/[id]: ArgumentPanel renders PT-BR primary with collapsible EN fallback when both exist - BureauSnapshot homepage: position_pt_br / topic_pt_br / verdict_pt_br primary - DocBureauPanel /d/[doc]: same primary-PT-BR pattern - New web/lib/i18n/pick.ts helper (unused yet by chat/agents — kept for future locale-driven switching when both languages are equally full; current rule is PT-BR-first since the user is brasileiro) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 12:02:59 -03:00
Luiz Gustavo	dd75a67964	W3.8: Investigation Bureau complete — Poirot, Taleb, Tetlock, Case-Writer Some checks failed CI / Web — typecheck + lint + build (push) Failing after 45s Details CI / Scripts — Python smoke (push) Failing after 5s Details CI / Web — npm audit (push) Failing after 40s Details CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 3s Details Brings the bureau from 4 → 8 detectives. All eight run as Bun + claude-CLI subprocesses against the same Supabase + investigation_jobs LISTEN/NOTIFY queue, sharing search.ts hybridSearch and writer-side validators that gate writes against schema + FK. New detectives: Poirot (witness_analysis) - prompts/poirot.md — credibility / access / bias / corroboration / verdict; uses entity_mentions JOIN chunks to pull 12 chunks per person; resolves corroboration_refs chunk_ids defensively (accepts bare cNNNN even when the model emits pNNN/cNNNN). - INSERT into public.witnesses with W-NNNN naming. - Tone: purple (#9b5de5). Taleb (outlier_scan) - prompts/taleb.md — "surprise is relative to a model"; at most 3 outliers; each requires explicit dominant_model + why_surprising + what_it_implies; fan-out into public.gaps with scope.kind="outlier". - Same unscoped-fallback as Dupin (Pass 1 with doc_id, Pass 2 widens to corpus if hits < 3). - Tone: yellow (#ffd23f). Tetlock (calibrate_hypothesis) - prompts/tetlock.md — honest Bayesian update; emits new_posterior + Δ + recommended_action ∈ {keep, downgrade, upgrade, supersede}. - write_calibration UPDATEs public.hypotheses + APPENDS a "## Calibration history" section to the H-NNNN.md case file (calibration is append-only — each datapoint matters). Posterior band auto-corrected to match Tetlock thresholds. - NO_NEW_EVIDENCE sentinel handled; pure 'keep' with \|Δ\|<0.005 only touches updated_at + reviewed_by. - Tone: teal (#26d4cc). Case-Writer (case_report) - prompts/case-writer.md — Dr. Watson assembles all artefacts (E-NNNN, H-NNNN, R-NNNN, W-NNNN, G-NNNN) into a five-act narrative. ILIKE filter on topic; doc_id optional scope. - Larger budget cap (≥ $0.50) + longer timeout for prose generation. - Writes case/reports/<slug>.md with frontmatter (topic + counts); no DB table for v0. - New page /c/[slug] renders the report via MarkdownBody + stat chips. - Tone: gold (#e0c080). Hardening across the bureau: - Sentinel parsing now accepts backticked AND prose-trailing forms (Holmes NO_HYPOTHESES, Dupin NO_CONTRADICTIONS, Schneier INSUFFICIENT_HYPOTHESIS, Poirot INSUFFICIENT_TESTIMONY, Taleb NO_OUTLIERS, Tetlock NO_NEW_EVIDENCE, Case-Writer INSUFFICIENT_ARTEFACTS). Avoids the failure mode where the model refuses honestly but the runtime treated it as a parse error (observed live with Poirot+Hoover identifying the DIRECTOR false-positive disambiguation issue in entity_mentions). Chat tool extensions (web/lib/chat/tools.ts): - request_investigation now accepts 7 kinds. Each routes to its detective with appropriate validation (hypothesis_id regex, person_id kebab-case, topic non-empty, doc_id for evidence_chain). - ETA per kind: Holmes/Dupin 60s, Poirot 45s, Schneier/Tetlock 30s, Taleb 50s, Case-Writer 180s (longer prose), Locard 30×n_chunks. UI integration: - chat-bubble inline card paints each detective in its tone color. - /jobs/[id] page header swaps name/subtitle/tone per detective; question label adapts ("Topic" / "Hypothesis under attack" / "Witness under analysis" / "Topic to outlier-scan" / "Hypothesis under recalibration" / "Case to assemble"). - job-status-poller renders: case-report link card (gold), outlier cards (yellow), witness cards (purple) — alongside existing hypothesis, evidence, contradiction cards. - /api/jobs/[id] hydrates witnesses (JOIN entities for canonical_name) + gaps (with scope JSONB). - /c/[slug] page reads /data/ufo/case/reports/<slug>.md and renders with MarkdownBody, frontmatter parsed for stat chips. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 22:11:39 -03:00

Author

SHA1

Message

Date

Luiz Gustavo

b3a6a3c1a3

W5.2: best-seller case-writer — single voice, scene-driven, anti-skeptic

CI / Web — typecheck + lint + build (push) Failing after 38s

Details

CI / Scripts — Python smoke (push) Failing after 3s

Details

CI / Web — npm audit (push) Failing after 27s

Details

CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 3s

Details

User: "shouldn't mention the names of the mind-clones, should merge all
analyses and write like a best-seller author would, about what happened."

Voice rewrite (prompts/case-writer.md):
  - Reference voices: Erik Larson, Sam Kean, John McPhee, Mark Bowden.
    Plainspoken non-fiction, scene-driven, fascinated.
  - One narrator. NEVER say "Sherlock Holmes argues" / "Sun-Tzu builds
    the case" / "the team concluded". No internal-process names reach
    the reader.
  - Hook the first paragraph. Open in a scene with a date, place, and
    person doing something specific. NOT "This case investigates..."
  - Show, don't argue. Verbatim quotes stay source-language in
    blockquotes; the narration around them is the narrator's voice.
  - Every claim cites a chunk with [[doc-id/pNNN#cNNNN]].
  - Forbidden ceremony: "In summary…", "Em suma…", "Ultimately…",
    "It is worth noting…", detective names, probability tables,
    hypothesis tournaments.
  - The honest unknown is the subject, not a failure: "Whatever was in
    the sky over Sandia in December 1948, the government never said."
  - 4-6 numbered scenes, each title-cased specifically ("The Green
    Sphere Over Highway 60" not "Background").
  - Bilingual EN + PT-BR per CLAUDE.md §3 — sections alternate, no
    mid-paragraph language mixing.
  - Refusal: emit INSUFFICIENT_ARTEFACTS rather than padding when the
    corpus is thin.

Raw-material pipeline (src/detectives/case_writer.ts):
  - hybridSearch(topic, lang, top_k=18) gives the narrator real corpus
    scenes with verbatim text + chunk_id citations + bbox metadata.
    This is what was missing — v1 only saw pre-digested hypothesis
    artefacts, which is how the academic prose got there.
  - Dropped the hypotheses + contradictions queries from the loader.
    They were skeptic-framing scaffolding that doesn't belong in the
    raw material a best-seller narrator works from.
  - New buildPrompt sections: "Primary-source scenes", "Curated
    verbatim quotes", "Anomalies and surprises", "Named witnesses".
    Anomalies (Taleb's outlier gaps) reframed: drop dominant_model
    skeptic baseline, keep title + why_surprising as gold material.
  - Refusal floor: < 4 scenes from hybridSearch → skip with reason.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-24 14:21:53 -03:00

Luiz Gustavo

7826710051

W4: bilingual EN + PT-BR Investigation Bureau (CLAUDE.md §3 contract)

CI / Web — typecheck + lint + build (push) Failing after 41s

Details

CI / Scripts — Python smoke (push) Failing after 4s

Details

CI / Web — npm audit (push) Failing after 26s

Details

CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 4s

Details

User flagged that the bureau was emitting English-only output, violating
the project's bilingual rule. Every narrative field now ships in both
languages: stored in sibling DB columns + rendered as adjacent markdown
sections per CLAUDE.md §3.

Migration 0007 (apply as supabase_admin):
  - public.hypotheses    +question_pt_br, +position_pt_br,
                         +argument_for_pt_br, +argument_against_pt_br
  - public.contradictions +topic_pt_br, +notes_pt_br
  - public.witnesses     +access_to_event_pt_br, +bias_notes_pt_br,
                         +verdict_pt_br
  - public.gaps          +description_pt_br, +suggested_next_move_pt_br
  - public.evidence: unchanged (verbatim_excerpt stays source-language)
  - JSONB siblings inside contradictions.chunks + gaps.scope handled at
    runtime (statement_pt_br, title_pt_br, dominant_model_pt_br,
    why_surprising_pt_br, what_it_implies_pt_br).

Detective prompts (all 7) rewritten with explicit bilingual JSON contract:
  - Output protocol section names every EN field + its _pt_br sibling
  - "Bilingual is mandatory" warning in the task instruction
  - Sentinel skip-states unchanged (NO_HYPOTHESES, NO_CONTRADICTIONS,
    INSUFFICIENT_TESTIMONY, INSUFFICIENT_HYPOTHESIS, NO_OUTLIERS,
    NO_NEW_EVIDENCE, INSUFFICIENT_ARTEFACTS)
  - Schneier: parallel arrays — hidden_assumptions[i] matches
    hidden_assumptions_pt_br[i], lengths must match
  - Case-Writer: interleaved §1 (EN) / §1 (PT-BR) per act in the body

Writer-side validation (all 7 tools):
  - Reject INSERT if PT-BR sibling missing when EN field is set
  - Persist both languages atomically in one INSERT (no half-updates)
  - Markdown renderers write adjacent EN+PT-BR sections in case files
    (## Argument for (EN) followed by ## Argumento a favor (PT-BR), etc.)

Detective parse layer (all 7 detectives):
  - Coerce both keys from JSON output
  - "incomplete_bilingual_*" skip reason when either side missing
  - Defensive: PT-BR fields trimmed + length-capped same as EN

Orchestrator propagates question_pt_br + topic_pt_br through job payload
to runHolmes / runCaseWriter, mirroring the chat-tool entry point.

Web (UI):
  - /api/jobs/[id] hydrates _pt_br siblings from pg
  - job-status-poller HypothesisCard: PT-BR primary, EN in <details>
    fallback when both exist
  - ContradictionCard: PT-BR statement primary + secondary EN quote
  - WitnessCard: PT-BR verdict primary + secondary EN quote, panels in PT
  - GapCard: PT-BR title/why/implies primary
  - /bureau hub: SELECTs both columns, renders PT-BR primary
  - /h/[id]: ArgumentPanel renders PT-BR primary with collapsible EN
    fallback when both exist
  - BureauSnapshot homepage: position_pt_br / topic_pt_br / verdict_pt_br
    primary
  - DocBureauPanel /d/[doc]: same primary-PT-BR pattern
  - New web/lib/i18n/pick.ts helper (unused yet by chat/agents — kept
    for future locale-driven switching when both languages are equally
    full; current rule is PT-BR-first since the user is brasileiro)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-24 12:02:59 -03:00

Luiz Gustavo

dd75a67964

W3.8: Investigation Bureau complete — Poirot, Taleb, Tetlock, Case-Writer

CI / Web — typecheck + lint + build (push) Failing after 45s

Details

CI / Scripts — Python smoke (push) Failing after 5s

Details

CI / Web — npm audit (push) Failing after 40s

Details

CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 3s

Details

Brings the bureau from 4 → 8 detectives. All eight run as Bun + claude-CLI
subprocesses against the same Supabase + investigation_jobs LISTEN/NOTIFY
queue, sharing search.ts hybridSearch and writer-side validators that
gate writes against schema + FK.

New detectives:

  Poirot (witness_analysis)
    - prompts/poirot.md — credibility / access / bias / corroboration /
      verdict; uses entity_mentions JOIN chunks to pull 12 chunks per
      person; resolves corroboration_refs chunk_ids defensively (accepts
      bare cNNNN even when the model emits pNNN/cNNNN).
    - INSERT into public.witnesses with W-NNNN naming.
    - Tone: purple (#9b5de5).

  Taleb (outlier_scan)
    - prompts/taleb.md — "surprise is relative to a model"; at most 3
      outliers; each requires explicit dominant_model + why_surprising +
      what_it_implies; fan-out into public.gaps with scope.kind="outlier".
    - Same unscoped-fallback as Dupin (Pass 1 with doc_id, Pass 2 widens
      to corpus if hits < 3).
    - Tone: yellow (#ffd23f).

  Tetlock (calibrate_hypothesis)
    - prompts/tetlock.md — honest Bayesian update; emits new_posterior +
      Δ + recommended_action ∈ {keep, downgrade, upgrade, supersede}.
    - write_calibration UPDATEs public.hypotheses + APPENDS a
      "## Calibration history" section to the H-NNNN.md case file
      (calibration is append-only — each datapoint matters). Posterior
      band auto-corrected to match Tetlock thresholds.
    - NO_NEW_EVIDENCE sentinel handled; pure 'keep' with |Δ|<0.005 only
      touches updated_at + reviewed_by.
    - Tone: teal (#26d4cc).

  Case-Writer (case_report)
    - prompts/case-writer.md — Dr. Watson assembles all artefacts
      (E-NNNN, H-NNNN, R-NNNN, W-NNNN, G-NNNN) into a five-act narrative.
      ILIKE filter on topic; doc_id optional scope.
    - Larger budget cap (≥ $0.50) + longer timeout for prose generation.
    - Writes case/reports/<slug>.md with frontmatter (topic + counts);
      no DB table for v0.
    - New page /c/[slug] renders the report via MarkdownBody + stat chips.
    - Tone: gold (#e0c080).

Hardening across the bureau:
  - Sentinel parsing now accepts backticked AND prose-trailing forms
    (Holmes NO_HYPOTHESES, Dupin NO_CONTRADICTIONS, Schneier
    INSUFFICIENT_HYPOTHESIS, Poirot INSUFFICIENT_TESTIMONY, Taleb
    NO_OUTLIERS, Tetlock NO_NEW_EVIDENCE, Case-Writer
    INSUFFICIENT_ARTEFACTS). Avoids the failure mode where the model
    refuses honestly but the runtime treated it as a parse error
    (observed live with Poirot+Hoover identifying the DIRECTOR
    false-positive disambiguation issue in entity_mentions).

Chat tool extensions (web/lib/chat/tools.ts):
  - request_investigation now accepts 7 kinds. Each routes to its
    detective with appropriate validation (hypothesis_id regex,
    person_id kebab-case, topic non-empty, doc_id for evidence_chain).
  - ETA per kind: Holmes/Dupin 60s, Poirot 45s, Schneier/Tetlock 30s,
    Taleb 50s, Case-Writer 180s (longer prose), Locard 30×n_chunks.

UI integration:
  - chat-bubble inline card paints each detective in its tone color.
  - /jobs/[id] page header swaps name/subtitle/tone per detective;
    question label adapts ("Topic" / "Hypothesis under attack" /
    "Witness under analysis" / "Topic to outlier-scan" / "Hypothesis
    under recalibration" / "Case to assemble").
  - job-status-poller renders: case-report link card (gold), outlier
    cards (yellow), witness cards (purple) — alongside existing
    hypothesis, evidence, contradiction cards.
  - /api/jobs/[id] hydrates witnesses (JOIN entities for canonical_name)
    + gaps (with scope JSONB).
  - /c/[slug] page reads /data/ufo/case/reports/<slug>.md and renders
    with MarkdownBody, frontmatter parsed for stat chips.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-23 22:11:39 -03:00

3 commits