disclosure-bureau/docs/adrs/ADR-002-investigation-bureau-runtime.md
Luiz Gustavo eaf282c535
Some checks failed
CI / Web — typecheck + lint + build (push) Failing after 40s
CI / Scripts — Python smoke (push) Failing after 3s
CI / Web — npm audit (push) Failing after 29s
CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 3s
W2: rerank opt-in, analyze_image_region tool, RAG eval, graph cleanup, ADRs
- TD#8 hybrid.ts: rerank_strategy {always|when_top_k_gt|never} + threshold
  (default skips rerank for top_k ≤ 15; chat tool uses threshold 10)
- O11 vision.ts + tools.ts: analyze_image_region tool — sharp-crops the
  bbox, claude CLI reads the temp PNG via Read tool, Sonnet vision answers
- TD#12 /graph: SigmaGraph replaces ForceGraphCanvas; react-force-graph-2d
  uninstalled (-37 transitive deps); force-graph-canvas.tsx deleted
- TD#27 messages/route.ts gatherContext slice sizes via CTX_* env vars
- TD#22 tests/rag/: golden.yaml (15 queries) + run.py (Recall@k + MRR +
  negative-pass rate) + baseline.json + CI job in .forgejo/workflows/ci.yml
- docs/adrs/: ADR-001..005 published from systems-atelier deliverables

Verified live on disclosure.top: top_k=5 path skips rerank (6.7s embed-only,
was 12-15s with rerank); rerank=always still available on demand.
First RAG baseline: Recall@5 = 0.2083, MRR = 0.25, Negative pass = 1.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 19:20:09 -03:00

4 KiB

adr title status date deciders project
ADR-002 Materializar Investigation Bureau — runtime agentico em background, 8 detetives como roles accepted 2026-05-23 sa-principal, sa-architecture-lead, sa-security-engineer (veto power) disclosure-bureau

Context

O branding "The Disclosure Bureau" promete "8 detetives investigativos" (Holmes/Poirot/Dupin/Locard/Schneier/Tetlock/Taleb + Investigation Bureau coletivo) com chain of custody, hypothesis tournament, residual uncertainty calculation. Hoje, o codebase tem:

  • case/ filesystem com 6 pastas — 5 vazias, 1 com 2 gap files.
  • Chat com 12 tools read-only e um system prompt grandioso.
  • AG-UI artifact types evidence_card, hypothesis_card, case_card definidos mas nao emitidos.
  • Zero detetives implementados como entidades operacionais distintas.

O brief pede: "AI detective bureau REAL, nao decorativo". Isso requer producao de dado novo (case/evidence/*.md, case/hypotheses/*.md, public.{hypotheses,evidence,contradictions,...}) por agentes especializados com outputs estruturados e auditaveis.

Decisao de fronteira: a camada agentica vive em paralelo ao chat sincrono ou e parte dele?

Options considered

  1. Parte do chat sincrono. Estender system prompt + adicionar write tools. Usuario espera 30s-5min sincrono.
  2. Worker em background. Chat dispara job; usuario polls; worker assincrono produz outputs.
  3. Sem agentic layer: manter so chat read-only. Refatorar branding para refletir realidade ("AI-assisted wiki").
  4. CronJob batch only. Sem trigger user. Investigacoes acontecem em background diario.

Decision

Opcao 2: Worker em background, separado do chat sincrono.

Especificamente:

  1. Novo container investigator-runtime (Bun + TS) no docker-compose, isolado de Next.js.
  2. 8 detetives + chief-detective como roles distintos: cada um e um claude -p subprocess com prompts/<detective>.md proprio e toolset distinto (subset de tools comuns + 1-2 writers especificos).
  3. Postgres LISTEN/NOTIFY como queue (public.investigation_jobs + trigger NOTIFY).
  4. Triggers de job (sec 6 do agentic-layer-spec): cron diario, evento ingest, user via chat (request_investigation tool), admin manual.
  5. Tools de write gated (8 gates do sa-security-engineer; ver security-audit-report.md secao 5).
  6. Budget cap por job: $1.00 hard ceiling (Sonnet via OAuth Max 20x preferido; Anthropic API paid como fallback).
  7. Outputs validados antes de commit: schema check + lint (04-lint.py --dry-run) sobre markdown gerado.

Nao adotamos:

  • Opcao 1 (estender chat sincrono): user nao pode esperar 5 min num chat. Quebra modelo mental.
  • Opcao 3 (sem agentic): foge do brief explicito. Branding sem motor e desonesto.
  • Opcao 4 (cron only): sem trigger user e UX pobre. Manter cron como complementar, nao exclusivo.

Consequences

Positivas:

  • Branding "8 detetives" passa a ter motor real.
  • Chat sincrono continua rapido (LLM read-only + 12 tools).
  • Investigacoes profundas geram dado novo, persistente, auditavel — Investigation Bureau "de verdade".
  • Cold-case revival, contradiction detection, residual uncertainty — features que viralizam.

Negativas:

  • Novo container = nova superficie operacional (~150MB RAM extra; orchestrator + state).
  • Quota Claude Max 20x mais utilizada (ja monitorada por /api/admin/batch).
  • Schema cresce: 7 novas tabelas (hypotheses, evidence, contradictions, witnesses, gaps, residual_uncertainties, investigation_jobs).
  • Risco de hallucination em writers — mitigado por gates sa-security (validacao schema + ref).

Verification

  • Spec completa em agentic-layer-spec.md.
  • Plano de bring-up incremental em 10 sub-steps W3.1-W3.10.
  • 8 gates documentados para sa-security veto.
  • Custos esperados $30-110/mes (tabela secao 11 do spec).
  • Golden hypothesis set como quality bar (W3.10).

References

  • agentic-layer-spec.md
  • ai-opportunity-map.md O1-O5
  • security-audit-report.md secao 5
  • Anthropic Claude Code OAuth pattern (memoria do projeto)