Commit graph

6 commits

Author SHA1 Message Date
Luiz Gustavo
eaf282c535 W2: rerank opt-in, analyze_image_region tool, RAG eval, graph cleanup, ADRs
Some checks failed
CI / Web — typecheck + lint + build (push) Failing after 40s
CI / Scripts — Python smoke (push) Failing after 3s
CI / Web — npm audit (push) Failing after 29s
CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 3s
- TD#8 hybrid.ts: rerank_strategy {always|when_top_k_gt|never} + threshold
  (default skips rerank for top_k ≤ 15; chat tool uses threshold 10)
- O11 vision.ts + tools.ts: analyze_image_region tool — sharp-crops the
  bbox, claude CLI reads the temp PNG via Read tool, Sonnet vision answers
- TD#12 /graph: SigmaGraph replaces ForceGraphCanvas; react-force-graph-2d
  uninstalled (-37 transitive deps); force-graph-canvas.tsx deleted
- TD#27 messages/route.ts gatherContext slice sizes via CTX_* env vars
- TD#22 tests/rag/: golden.yaml (15 queries) + run.py (Recall@k + MRR +
  negative-pass rate) + baseline.json + CI job in .forgejo/workflows/ci.yml
- docs/adrs/: ADR-001..005 published from systems-atelier deliverables

Verified live on disclosure.top: top_k=5 path skips rerank (6.7s embed-only,
was 12-15s with rerank); rerank=always still available on demand.
First RAG baseline: Recall@5 = 0.2083, MRR = 0.25, Negative pass = 1.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 19:20:09 -03:00
Luiz Gustavo
55cac8a395 W0+W1+W1.2: security hardening, observability, autocomplete, glitchtip, forgejo CI
Some checks failed
CI / Web — typecheck + lint + build (push) Failing after 1m30s
CI / Scripts — Python smoke (push) Failing after 32s
CI / Web — npm audit (push) Failing after 37s
W0 — security hardening (5 fixes verified live on disclosure.top)
- middleware: gate /api/admin/* same as /admin/* (F1)
- imgproxy: tighten LOCAL_FILESYSTEM_ROOT from / to /var/lib/storage (F2)
- studio: real basic-auth label (bcrypt hash, middleware reference) (F3)
- relations: ENABLE ROW LEVEL SECURITY + public SELECT policy (F4)
- migration 0003: fold is_searchable + hybrid_search update into canonical (TD#2)

W1 — observability + resilience + autocomplete
- studio: HOSTNAME=0.0.0.0 so Next.js binds on loopback for healthcheck
- compose: PG_POOL_MAX=20, CLAUDE_CODE_OAUTH_TOKEN gated by separate env
- claude-code.ts: subprocess timeout configurable (CLAUDE_CODE_TIMEOUT_MS)
- openrouter.ts: retry with exponential backoff + Retry-After + in-memory
  circuit breaker (promotes FALLBACK after CB_THRESHOLD failures)
- lib/logger.ts: pino logger (NDJSON prod / pretty dev) + withRequest helper
- middleware: mints correlation_id, stamps x-correlation-id response header,
  emits structured http_request log per /api/* call
- messages/route.ts: switch to structured logger
- 60_meili_index.py: push documents + chunks into Meilisearch
- /api/search/autocomplete: parallel meili search (docs + chunks), 5-8ms p50
- search-autocomplete.tsx: debounced dropdown wired into search-panel

W1.2 — Glitchtip + Forgejo self-hosted
- compose: glitchtip-redis + glitchtip-web + glitchtip-worker (v4.2)
- compose: forgejo + forgejo-runner (server v9, runner v6) with group_add=988
- @sentry/nextjs SDK wired (instrumentation.ts + sentry.{client,server}.config.ts)
- /api/admin/throw smoke endpoint (gated by W0-F1 middleware)
- Synthetic event ingestion verified at glitchtip.disclosure.top
- forgejo.disclosure.top up, repo discadmin/disclosure-bureau created,
  runner registered (labels: ubuntu-latest, docker)
- .forgejo/workflows/ci.yml: typecheck + lint + build + npm audit + python
  syntax + compose validation

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:18:42 -03:00
guto
c0c6652dd5 guard /admin/* by role + filter chat artifacts to cited chunks
middleware.ts now checks profiles.role on every /admin/* request and
returns a plain 404 (not a redirect) for anonymous users and any
authenticated user without role='admin'. The 404 wording matches a
non-existent route, so we don't leak the existence of an admin area.
gutomec@gmail.com promoted to admin in the live DB.

openrouter.ts chat now collects artifacts silently during the tool-call
loop instead of streaming them to the SSE. After the model writes its
final prose (and after the forced-synthesis pass if needed), we scan
the assembled text for [[doc-id/p007#cNNNN]] citations and emit ONLY the
artifacts referenced. Duplicates are deduped by chunk_id. The persisted
citations column on messages now stores the filtered set too, so old
sessions reload with the same focused card grid.

Before: every hybrid_search hit (up to 6 per call × 5 calls = 30+ citation
cards plus crop images) flooded the chat regardless of what the model
ended up using. After: only the chunks actually woven into the answer.
2026-05-18 17:41:35 -03:00
guto
9889308bf4 fix chat: force synthesis pass + fix ambiguous-column trigger
Two bugs combined to make the chat reply with only cards and no prose:

1. SQL trigger rollup_session_stats was failing with "column reference
   total_cost_usd is ambiguous" because the UPDATE on public.profiles had
   a FROM public.chat_sessions clause and both tables expose that column.
   Persistence of every user message died at this point — sessions were
   created in the DB but had message_count=0 forever. Applied SQL fix
   that qualifies columns with p./s. aliases (production DB updated;
   ALTER FUNCTION run live, not yet codified in a migration file).

2. The free-tier model (nemotron-3-super:free) spent all 5 tool-loop
   turns on hybrid_search calls and never wrote any prose, returning
   content_len=0. Added a forced-synthesis pass in openrouter.ts: when
   the loop exits with empty assembledText but the model did call tools,
   we send ONE final turn with tools omitted from the request payload
   and a user message instructing the model to answer in 3-8 sentences
   citing chunks. openrouterStreamCall now accepts a `withTools` opt
   so the synthesis call can disable tool calling entirely.

Verified end-to-end with the actual user query "O que os astronautas
viram? Quem foi que viu?" on /d/nasa-uap-d6-apollo-17-...:
- content_len: 0 → 947 chars (real synthesis citing Schmitt)
- artifacts: 44 preserved
- assistant message persisted with tool_calls + citations columns
2026-05-18 15:39:46 -03:00
guto
7d13f93393 ship: synthesize 158 entities, AG-UI artifacts, chat persistence, auth flow
Fase 3 onda 2 — entity synthesis at scale:
- scripts/synthesize/20_entity_summary.py: queries DB for entities with
  total_mentions ≥ threshold + top-K verbatim chunk snippets via
  entity_mentions JOIN, prompts Sonnet (Holmes-Watson voice, bilingual),
  writes narrative_summary EN+PT-BR + summary_status=synthesized.
  Ran on 187 candidates (mentions ≥ 20) → 158 OK · 1 err · 29 skipped (no
  snippets). Combined with anchor curation: 20 curated + 158 synthesized
  = 178 entities with real narrative (vs 0 a day ago).

Fase 4 — chat with typed artifacts + persistence:
- lib/chat/agui.ts: AG-UI v1 typed Artifact union (citation, crop_image,
  entity_card, evidence_card, hypothesis_card, case_card, navigation_offer)
  alongside the existing event types.
- lib/chat/tools.ts + openrouter.ts: hybrid_search emits up to 6
  citation + crop_image artifacts per query. Provider collects them and
  returns in done.artifacts so the route can persist.
- api/sessions/[id]/messages: persist artifacts to messages.citations.
- components/chat-bubble.tsx: ArtifactCard renders inline cards (citation,
  crop_image, entity_card, navigation_offer) for streamed and persisted
  messages. activeId now persisted in localStorage so navigation between
  pages keeps the same conversation. New sessions are lazy (only when user
  has zero). loadMessages hydrates tools + artifacts from server. CRUD UI:
  rename (✎) + archive (🗑) buttons per session in the list.

Home search:
- doc-list-filters: input now fires hybrid_search (rerank=0 for speed)
  in parallel with the local title filter; chunk hits render above the doc
  grid with snippet + score + classification.
- api/search/hybrid: accept ?rerank=0 to skip the cross-encoder (1.3s vs 60s).

Auth flow:
- infra: SMTP_HOST=mail.spacemail.com:587 + DMARC published; mail now lands
  in inbox. GOTRUE_MAILER_AUTOCONFIRM=false (real email verification).
- kong.yml: proxy /auth/callback on api.disclosure.top → web:3000 so PKCE
  email links don't 404 at the gateway.
- web/app/auth/callback: handle both ?code= (OAuth) and ?token=&type=
  (PKCE); redirect to the public site host before verifyOtp so the session
  cookie lands on the right domain.

Audit deliverables:
- .nirvana/outputs/disclosure-bureau/.../systems-atelier/: 5 docs (code
  analysis, tech debt, discovery brief, system arch, 5 ADRs) authored by
  sa-principal that produced this roadmap. Kept in-tree for traceability.
2026-05-18 03:52:59 -03:00
guto
19d0678e55 baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00