disclosure-bureau/docs/adrs/ADR-002-investigation-bureau-runtime.md
Luiz Gustavo eaf282c535
Some checks failed
CI / Web — typecheck + lint + build (push) Failing after 40s
CI / Scripts — Python smoke (push) Failing after 3s
CI / Web — npm audit (push) Failing after 29s
CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 3s
W2: rerank opt-in, analyze_image_region tool, RAG eval, graph cleanup, ADRs
- TD#8 hybrid.ts: rerank_strategy {always|when_top_k_gt|never} + threshold
  (default skips rerank for top_k ≤ 15; chat tool uses threshold 10)
- O11 vision.ts + tools.ts: analyze_image_region tool — sharp-crops the
  bbox, claude CLI reads the temp PNG via Read tool, Sonnet vision answers
- TD#12 /graph: SigmaGraph replaces ForceGraphCanvas; react-force-graph-2d
  uninstalled (-37 transitive deps); force-graph-canvas.tsx deleted
- TD#27 messages/route.ts gatherContext slice sizes via CTX_* env vars
- TD#22 tests/rag/: golden.yaml (15 queries) + run.py (Recall@k + MRR +
  negative-pass rate) + baseline.json + CI job in .forgejo/workflows/ci.yml
- docs/adrs/: ADR-001..005 published from systems-atelier deliverables

Verified live on disclosure.top: top_k=5 path skips rerank (6.7s embed-only,
was 12-15s with rerank); rerank=always still available on demand.
First RAG baseline: Recall@5 = 0.2083, MRR = 0.25, Negative pass = 1.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 19:20:09 -03:00

77 lines
4 KiB
Markdown

---
adr: ADR-002
title: Materializar Investigation Bureau — runtime agentico em background, 8 detetives como roles
status: accepted
date: 2026-05-23
deciders: sa-principal, sa-architecture-lead, sa-security-engineer (veto power)
project: disclosure-bureau
---
## Context
O branding "The Disclosure Bureau" promete "8 detetives investigativos" (Holmes/Poirot/Dupin/Locard/Schneier/Tetlock/Taleb + Investigation Bureau coletivo) com chain of custody, hypothesis tournament, residual uncertainty calculation. Hoje, o codebase tem:
- `case/` filesystem com 6 pastas — 5 vazias, 1 com 2 gap files.
- Chat com 12 tools read-only e um system prompt grandioso.
- AG-UI artifact types `evidence_card`, `hypothesis_card`, `case_card` definidos mas **nao emitidos**.
- Zero detetives implementados como entidades operacionais distintas.
O brief pede: "AI detective bureau REAL, nao decorativo". Isso requer **producao** de dado novo (`case/evidence/*.md`, `case/hypotheses/*.md`, `public.{hypotheses,evidence,contradictions,...}`) por **agentes especializados** com **outputs estruturados e auditaveis**.
Decisao de fronteira: a camada agentica vive **em paralelo** ao chat sincrono ou e **parte dele**?
## Options considered
1. **Parte do chat sincrono.** Estender system prompt + adicionar write tools. Usuario espera 30s-5min sincrono.
2. **Worker em background.** Chat dispara job; usuario polls; worker assincrono produz outputs.
3. **Sem agentic layer**: manter so chat read-only. Refatorar branding para refletir realidade ("AI-assisted wiki").
4. **CronJob batch only**. Sem trigger user. Investigacoes acontecem em background diario.
## Decision
**Opcao 2: Worker em background, separado do chat sincrono.**
Especificamente:
1. **Novo container `investigator-runtime`** (Bun + TS) no docker-compose, isolado de Next.js.
2. **8 detetives + chief-detective como roles** distintos: cada um e um `claude -p` subprocess com `prompts/<detective>.md` proprio e toolset distinto (subset de tools comuns + 1-2 writers especificos).
3. **Postgres LISTEN/NOTIFY** como queue (`public.investigation_jobs` + trigger NOTIFY).
4. **Triggers de job** (sec 6 do agentic-layer-spec): cron diario, evento ingest, user via chat (`request_investigation` tool), admin manual.
5. **Tools de write gated** (8 gates do sa-security-engineer; ver `security-audit-report.md` secao 5).
6. **Budget cap por job:** $1.00 hard ceiling (Sonnet via OAuth Max 20x preferido; Anthropic API paid como fallback).
7. **Outputs validados antes de commit:** schema check + lint (`04-lint.py --dry-run`) sobre markdown gerado.
**Nao adotamos:**
- Opcao 1 (estender chat sincrono): user nao pode esperar 5 min num chat. Quebra modelo mental.
- Opcao 3 (sem agentic): foge do brief explicito. Branding sem motor e desonesto.
- Opcao 4 (cron only): sem trigger user e UX pobre. Manter cron como complementar, nao exclusivo.
## Consequences
**Positivas:**
- Branding "8 detetives" passa a ter motor real.
- Chat sincrono continua rapido (LLM read-only + 12 tools).
- Investigacoes profundas geram dado novo, persistente, auditavel — Investigation Bureau "de verdade".
- Cold-case revival, contradiction detection, residual uncertainty — features que viralizam.
**Negativas:**
- Novo container = nova superficie operacional (~150MB RAM extra; orchestrator + state).
- Quota Claude Max 20x mais utilizada (ja monitorada por `/api/admin/batch`).
- Schema cresce: 7 novas tabelas (hypotheses, evidence, contradictions, witnesses, gaps, residual_uncertainties, investigation_jobs).
- Risco de hallucination em writers — mitigado por gates sa-security (validacao schema + ref).
## Verification
- Spec completa em `agentic-layer-spec.md`.
- Plano de bring-up incremental em 10 sub-steps W3.1-W3.10.
- 8 gates documentados para sa-security veto.
- Custos esperados $30-110/mes (tabela secao 11 do spec).
- Golden hypothesis set como quality bar (W3.10).
## References
- `agentic-layer-spec.md`
- `ai-opportunity-map.md` O1-O5
- `security-audit-report.md` secao 5
- Anthropic Claude Code OAuth pattern (memoria do projeto)