disclosure-bureau/docs/adrs/ADR-003-llm-routing-policy.md at main

discadmin/disclosure-bureau

Fork 0

Luiz Gustavo eaf282c535

CI / Web — typecheck + lint + build (push) Failing after 40s

Details

CI / Scripts — Python smoke (push) Failing after 3s

Details

CI / Web — npm audit (push) Failing after 29s

Details

CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 3s

Details

W2: rerank opt-in, analyze_image_region tool, RAG eval, graph cleanup, ADRs

- TD#8 hybrid.ts: rerank_strategy {always|when_top_k_gt|never} + threshold
  (default skips rerank for top_k ≤ 15; chat tool uses threshold 10)
- O11 vision.ts + tools.ts: analyze_image_region tool — sharp-crops the
  bbox, claude CLI reads the temp PNG via Read tool, Sonnet vision answers
- TD#12 /graph: SigmaGraph replaces ForceGraphCanvas; react-force-graph-2d
  uninstalled (-37 transitive deps); force-graph-canvas.tsx deleted
- TD#27 messages/route.ts gatherContext slice sizes via CTX_* env vars
- TD#22 tests/rag/: golden.yaml (15 queries) + run.py (Recall@k + MRR +
  negative-pass rate) + baseline.json + CI job in .forgejo/workflows/ci.yml
- docs/adrs/: ADR-001..005 published from systems-atelier deliverables

Verified live on disclosure.top: top_k=5 path skips rerank (6.7s embed-only,
was 12-15s with rerank); rerank=always still available on demand.
First RAG baseline: Recall@5 = 0.2083, MRR = 0.25, Negative pass = 1.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-23 19:20:09 -03:00

3.5 KiB

Raw Permalink Blame History

adr	title	status	date	deciders	project
ADR-003	LLM routing policy — Claude Sonnet 4.6 via OAuth para producao asincrona; OpenRouter free para chat publico	accepted	2026-05-23	sa-principal, sa-platform-lead	disclosure-bureau

Context

Tres caminhos de LLM no projeto:

Vision pipeline (ingest): Sonnet 4.6 via Anthropic SDK + prompt caching + pdf-2025-03-04 beta. Custo unico ~$409 inicial.
Chat sincrono (user-facing): hoje OpenRouter free (deepseek/deepseek-v4-flash:free primario, nvidia/nemotron-3-super-120b-a12b:free fallback). Tool calling funciona.
Investigation Bureau (W3+ a implementar): propostas: Sonnet 4.6 via OAuth Max 20x.

Restricoes existentes:

Politica banida Gemini (memoria do projeto). Cobranca de ~$200 vs $10 esperado.
OAuth Max 20x quota: 5h rolling window, default 4 workers (memoria).
Self-hosted by default: managed proibido sem excecao escrita (ADR-005).

Decision

Roteamento por canal e por carga:

Canal	Provider	Modelo	Razao
Vision pipeline (background)	Anthropic SDK direto	Sonnet 4.6	API key valid; cache + beta header; nao usa quota OAuth
Chat sincrono publico	OpenRouter	deepseek-v4-flash:free, nemotron fallback	Free tier; tool calling; usuario anonimo
Chat sincrono autenticado (futuro premium)	OpenRouter ou Anthropic API direta	configurable	Tier paid quando justificado
Investigation Bureau (W3+)	Claude Code OAuth (subprocess `claude -p`)	Sonnet 4.6 (model: sonnet)	Quota Max 20x; budget cap por job $1.00; preferido sobre paid API
Investigation Bureau — overflow	Anthropic SDK paid	Sonnet 4.6 ou Haiku	Quando OAuth quota saturada AND `BUDGET_PAID_ALLOWED=true`
LLM judge interno (calibration / contradiction detection)	Claude OAuth ou OpenRouter	Haiku (cheap, fast)	Tarefa simples, batch

Politica de fallback:

Primary tenta. Se 429/quota -> 1 retry com backoff.
Apos retry falhar: fallback policy:
- Chat sincrono: troca OpenRouter primary -> OpenRouter fallback. Se ambos falham, retorna erro UX.
- Vision/investigator: aborta job, registra em investigation_jobs.status='failed'. Aguarda quota reset (5h).
/api/admin/batch ja monitora 429 + ETA quota reset.

Excecoes:

Gemini banido (politica). Nao reativar mesmo se nova versao for atrativa.
Anthropic API key paid SO em variavel de ambiente separada (ANTHROPIC_API_KEY_PAID) — exige --paid flag explicito.

Consequences

Positivas:

Investigation Bureau pode operar 99% do tempo em quota OAuth (gratuita para o projeto).
Chat sincrono publico continua $0/req.
Separacao clara entre "sob quota" e "paid" — facil monitorar gasto.

Negativas:

OpenRouter free-tier tem rate limits + latencia variavel. Mitigacao em W1 (retry + circuit breaker).
Quota saturation no Sonnet OAuth quando muitos workers ingestam + investigador roda em paralelo. Cron diario investigador as 03-05 UTC quando ingest e baixa.

Verification

Logs Sentry mostram model_used em cada chat call.
/api/admin/batch mostra quota_state + quota_resume_eta_minutes.
investigation_jobs.outputs registra model para cada turno.
Budget alert em $150/mes Anthropic API se cair em paid fallback.

References

feedback-no-gemini-ever.md (memoria)
user-plan-max-20x.md (memoria)
web/lib/chat/{index,openrouter,claude-code}.ts

3.5 KiB Raw Permalink Blame History

Context

Decision

Consequences

Verification

References

3.5 KiB

Raw Permalink Blame History