- TD#8 hybrid.ts: rerank_strategy {always|when_top_k_gt|never} + threshold
(default skips rerank for top_k ≤ 15; chat tool uses threshold 10)
- O11 vision.ts + tools.ts: analyze_image_region tool — sharp-crops the
bbox, claude CLI reads the temp PNG via Read tool, Sonnet vision answers
- TD#12 /graph: SigmaGraph replaces ForceGraphCanvas; react-force-graph-2d
uninstalled (-37 transitive deps); force-graph-canvas.tsx deleted
- TD#27 messages/route.ts gatherContext slice sizes via CTX_* env vars
- TD#22 tests/rag/: golden.yaml (15 queries) + run.py (Recall@k + MRR +
negative-pass rate) + baseline.json + CI job in .forgejo/workflows/ci.yml
- docs/adrs/: ADR-001..005 published from systems-atelier deliverables
Verified live on disclosure.top: top_k=5 path skips rerank (6.7s embed-only,
was 12-15s with rerank); rerank=always still available on demand.
First RAG baseline: Recall@5 = 0.2083, MRR = 0.25, Negative pass = 1.0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
72 lines
3.5 KiB
Markdown
72 lines
3.5 KiB
Markdown
---
|
|
adr: ADR-003
|
|
title: LLM routing policy — Claude Sonnet 4.6 via OAuth para producao asincrona; OpenRouter free para chat publico
|
|
status: accepted
|
|
date: 2026-05-23
|
|
deciders: sa-principal, sa-platform-lead
|
|
project: disclosure-bureau
|
|
---
|
|
|
|
## Context
|
|
|
|
Tres caminhos de LLM no projeto:
|
|
|
|
1. **Vision pipeline (ingest)**: Sonnet 4.6 via Anthropic SDK + prompt caching + `pdf-2025-03-04` beta. Custo unico ~$409 inicial.
|
|
2. **Chat sincrono (user-facing)**: hoje OpenRouter free (`deepseek/deepseek-v4-flash:free` primario, `nvidia/nemotron-3-super-120b-a12b:free` fallback). Tool calling funciona.
|
|
3. **Investigation Bureau (W3+ a implementar)**: propostas: Sonnet 4.6 via OAuth Max 20x.
|
|
|
|
Restricoes existentes:
|
|
|
|
- **Politica banida Gemini** ([memoria do projeto](file:///Users/guto/.claude/projects/-Users-guto-ufo/memory/MEMORY.md)). Cobranca de ~$200 vs $10 esperado.
|
|
- **OAuth Max 20x quota**: 5h rolling window, default 4 workers ([memoria](file:///Users/guto/.claude/projects/-Users-guto-ufo/memory/MEMORY.md)).
|
|
- **Self-hosted by default**: managed proibido sem excecao escrita (ADR-005).
|
|
|
|
## Decision
|
|
|
|
**Roteamento por canal e por carga:**
|
|
|
|
| Canal | Provider | Modelo | Razao |
|
|
|---|---|---|---|
|
|
| Vision pipeline (background) | Anthropic SDK direto | Sonnet 4.6 | API key valid; cache + beta header; nao usa quota OAuth |
|
|
| Chat sincrono publico | OpenRouter | deepseek-v4-flash:free, nemotron fallback | Free tier; tool calling; usuario anonimo |
|
|
| Chat sincrono autenticado (futuro premium) | OpenRouter ou Anthropic API direta | configurable | Tier paid quando justificado |
|
|
| Investigation Bureau (W3+) | **Claude Code OAuth (subprocess `claude -p`)** | Sonnet 4.6 (model: sonnet) | Quota Max 20x; budget cap por job $1.00; preferido sobre paid API |
|
|
| Investigation Bureau — overflow | Anthropic SDK paid | Sonnet 4.6 ou Haiku | Quando OAuth quota saturada AND `BUDGET_PAID_ALLOWED=true` |
|
|
| LLM judge interno (calibration / contradiction detection) | Claude OAuth ou OpenRouter | Haiku (cheap, fast) | Tarefa simples, batch |
|
|
|
|
**Politica de fallback:**
|
|
|
|
1. Primary tenta. Se 429/quota -> 1 retry com backoff.
|
|
2. Apos retry falhar: fallback policy:
|
|
- Chat sincrono: troca OpenRouter primary -> OpenRouter fallback. Se ambos falham, retorna erro UX.
|
|
- Vision/investigator: aborta job, registra em `investigation_jobs.status='failed'`. Aguarda quota reset (5h).
|
|
3. `/api/admin/batch` ja monitora 429 + ETA quota reset.
|
|
|
|
**Excecoes:**
|
|
|
|
- Gemini **banido** (politica). Nao reativar mesmo se nova versao for atrativa.
|
|
- Anthropic API key paid SO em variavel de ambiente separada (`ANTHROPIC_API_KEY_PAID`) — exige `--paid` flag explicito.
|
|
|
|
## Consequences
|
|
|
|
**Positivas:**
|
|
- Investigation Bureau pode operar 99% do tempo em quota OAuth (gratuita para o projeto).
|
|
- Chat sincrono publico continua $0/req.
|
|
- Separacao clara entre "sob quota" e "paid" — facil monitorar gasto.
|
|
|
|
**Negativas:**
|
|
- OpenRouter free-tier tem rate limits + latencia variavel. Mitigacao em W1 (retry + circuit breaker).
|
|
- Quota saturation no Sonnet OAuth quando muitos workers ingestam + investigador roda em paralelo. Cron diario investigador as 03-05 UTC quando ingest e baixa.
|
|
|
|
## Verification
|
|
|
|
- Logs Sentry mostram `model_used` em cada chat call.
|
|
- `/api/admin/batch` mostra `quota_state` + `quota_resume_eta_minutes`.
|
|
- `investigation_jobs.outputs` registra `model` para cada turno.
|
|
- Budget alert em $150/mes Anthropic API se cair em paid fallback.
|
|
|
|
## References
|
|
|
|
- `feedback-no-gemini-ever.md` (memoria)
|
|
- `user-plan-max-20x.md` (memoria)
|
|
- `web/lib/chat/{index,openrouter,claude-code}.ts`
|