disclosure-bureau

discadmin/disclosure-bureau

Fork 0

Commit graph

Author	SHA1	Message	Date
Luiz Gustavo	4d4c02a8e1	W3.5: Holmes hypothesis tournament detective Some checks failed CI / Web — typecheck + lint + build (push) Failing after 34s Details CI / Scripts — Python smoke (push) Failing after 3s Details CI / Web — npm audit (push) Failing after 29s Details CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 3s Details Adds the second AI detective in the Investigation Bureau runtime: Sherlock Holmes, who builds 2-3 rival hypotheses with calibrated priors + posteriors against a corpus shortlist. Pipeline: 1. hybridSearch() grounds Holmes with 8-15 chunks via the same hybrid_search_chunks RPC the web uses (BM25 + dense + RRF). Default max_dense_dist=0.55 (runtime favors recall over precision; web's /api/search/hybrid stays at 0.40 for chat). 2. claude-sonnet-4-6 emits a strict JSON array with position + argument_for + argument_against + prior + posterior + confidence_band + evidence_refs. Citations use [[doc-id/pNNN#cNNNN]] wiki-links. 3. writeHypothesis() validates posterior ∈ [0,1], auto-corrects the Tetlock band from the posterior (high ≥0.90, medium 0.60-0.89, low 0.30-0.59, speculation <0.30), checks evidence_refs FK against public.evidence, INSERTs into public.hypotheses + writes case/hypotheses/H-NNNN.md. Discipline guarantees (prompts/holmes.md): - posteriors across rivals sum to ≈1.0 - no claim without chunk citation - prefer lower band when ambiguous (anti-inflation) - declarative one-sentence position, no hedging - emit `NO_HYPOTHESES` when corpus is silent (refuses to fabricate) Smoke test (Sandia green fireballs 1948-49): - H-0001 prior 0.5 → posterior 0.2 (speculation): natural meteoric - H-0002 prior 0.3 → posterior 0.4 (low): classified weapons / tests - H-0003 prior 0.2 → posterior 0.4 (low): genuinely unidentified Bayesian update visible: "natural meteoric" prior dropped 60%; both rivals climbed. 4 unique chunk citations across the 3 hypotheses. orchestrator dispatches `hypothesis_tournament` kind via runHolmes; job marked `failed` if all rivals error, `complete` otherwise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 21:19:43 -03:00

Author

SHA1

Message

Date

Luiz Gustavo

4d4c02a8e1

W3.5: Holmes hypothesis tournament detective

CI / Web — typecheck + lint + build (push) Failing after 34s

Details

CI / Scripts — Python smoke (push) Failing after 3s

Details

CI / Web — npm audit (push) Failing after 29s

Details

CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 3s

Details

Adds the second AI detective in the Investigation Bureau runtime: Sherlock
Holmes, who builds 2-3 rival hypotheses with calibrated priors + posteriors
against a corpus shortlist.

Pipeline:
  1. hybridSearch() grounds Holmes with 8-15 chunks via the same
     hybrid_search_chunks RPC the web uses (BM25 + dense + RRF). Default
     max_dense_dist=0.55 (runtime favors recall over precision; web's
     /api/search/hybrid stays at 0.40 for chat).
  2. claude-sonnet-4-6 emits a strict JSON array with position +
     argument_for + argument_against + prior + posterior + confidence_band
     + evidence_refs. Citations use [[doc-id/pNNN#cNNNN]] wiki-links.
  3. writeHypothesis() validates posterior ∈ [0,1], auto-corrects the
     Tetlock band from the posterior (high ≥0.90, medium 0.60-0.89,
     low 0.30-0.59, speculation <0.30), checks evidence_refs FK against
     public.evidence, INSERTs into public.hypotheses + writes
     case/hypotheses/H-NNNN.md.

Discipline guarantees (prompts/holmes.md):
  - posteriors across rivals sum to ≈1.0
  - no claim without chunk citation
  - prefer lower band when ambiguous (anti-inflation)
  - declarative one-sentence position, no hedging
  - emit `NO_HYPOTHESES` when corpus is silent (refuses to fabricate)

Smoke test (Sandia green fireballs 1948-49):
  - H-0001 prior 0.5 → posterior 0.2 (speculation): natural meteoric
  - H-0002 prior 0.3 → posterior 0.4 (low): classified weapons / tests
  - H-0003 prior 0.2 → posterior 0.4 (low): genuinely unidentified
  Bayesian update visible: "natural meteoric" prior dropped 60%; both
  rivals climbed. 4 unique chunk citations across the 3 hypotheses.

orchestrator dispatches `hypothesis_tournament` kind via runHolmes;
job marked `failed` if all rivals error, `complete` otherwise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-23 21:19:43 -03:00

1 commit