disclosure-bureau/CHANGELOG.md
Luiz Gustavo eaf282c535
Some checks failed
CI / Web — typecheck + lint + build (push) Failing after 40s
CI / Scripts — Python smoke (push) Failing after 3s
CI / Web — npm audit (push) Failing after 29s
CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 3s
W2: rerank opt-in, analyze_image_region tool, RAG eval, graph cleanup, ADRs
- TD#8 hybrid.ts: rerank_strategy {always|when_top_k_gt|never} + threshold
  (default skips rerank for top_k ≤ 15; chat tool uses threshold 10)
- O11 vision.ts + tools.ts: analyze_image_region tool — sharp-crops the
  bbox, claude CLI reads the temp PNG via Read tool, Sonnet vision answers
- TD#12 /graph: SigmaGraph replaces ForceGraphCanvas; react-force-graph-2d
  uninstalled (-37 transitive deps); force-graph-canvas.tsx deleted
- TD#27 messages/route.ts gatherContext slice sizes via CTX_* env vars
- TD#22 tests/rag/: golden.yaml (15 queries) + run.py (Recall@k + MRR +
  negative-pass rate) + baseline.json + CI job in .forgejo/workflows/ci.yml
- docs/adrs/: ADR-001..005 published from systems-atelier deliverables

Verified live on disclosure.top: top_k=5 path skips rerank (6.7s embed-only,
was 12-15s with rerank); rerank=always still available on demand.
First RAG baseline: Recall@5 = 0.2083, MRR = 0.25, Negative pass = 1.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 19:20:09 -03:00

215 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Changelog · Disclosure Bureau
All notable changes to this project go here. Newest on top.
## [Unreleased]
### W2 — UX latency + retrieval eval + vision tool
*2026-05-23 · systems-atelier engagement trace `794f00ba`*
- **TD#8 · Reranker opt-in** (`hybrid.ts`). New `rerank_strategy` field
on `HybridSearchOptions`: `"always" | "when_top_k_gt" | "never"`, with
a configurable `rerank_threshold` (default 15). Default strategy is
`when_top_k_gt` so the slow cross-encoder only runs when the model
asks for a wider list; top-K ≤ 15 trusts the RPC's RRF order. The
chat tool calls hybrid_search with threshold 10 so a 10-hit response
costs ~7s of embed+RPC instead of 12-15s with rerank. `/api/search/hybrid`
exposes the strategy via `?rerank=always|never|when_top_k_gt` plus
`?rerank_threshold=N`. Back-compat `?rerank=0` still means "never".
- **O11 · `analyze_image_region` chat tool** (`vision.ts`, `tools.ts`).
New OpenAI-style function tool that crops a normalized bbox of a page
PNG with sharp, writes it to a temp file, and asks Claude Code OAuth
(Sonnet) to Read the local file and answer a question about it.
Schema: `{doc_id, page, bbox{x,y,w,h}, question, context?}`. Emits a
`crop_image` artifact for the UI alongside the textual answer. Cost
budget: ~$0.0050.02 per call, paid against the user's Max 20x
quota. Timeout configurable via `VISION_TIMEOUT_MS` (default 120s).
- **TD#12 · `react-force-graph-2d` removed**. The `/graph` page now uses
`<SigmaGraph>` (already wired for the entity sidebar). One graph
library is enough. `web/components/force-graph-canvas.tsx` deleted;
`npm uninstall` removed 37 transitive deps.
- **TD#27 · Context truncation per type configurable**
(`messages/route.ts`). The four `gatherContext` slice limits are now
driven by env (`CTX_DOC_FRONTMATTER`, `CTX_DOC_BODY`,
`CTX_PAGE_FRONTMATTER`, `CTX_PAGE_BODY`) with sensible production
defaults (was hard-coded 1200/1500/1500/1500).
- **TD#22 · Golden RAG eval** (`tests/rag/`). New harness:
`golden.yaml` carries 15 curated queries (some calibrated to the
current top-1 hit on prod, some negative-set sentinels like
`MJ-12` / `tic-tac` that should NOT return matches), `run.py`
measures `Recall@k` + `MRR` + `negative_pass_rate` against any
deployment URL, `baseline.json` is the gate threshold, `last_run.json`
is the working report. Default behaviour: fail the run when Recall@5
drops > 0.05 from baseline. CI workflow runs against
`https://disclosure.top` on every push.
- First baseline (rerank=never): **Recall@5 = 0.2083, MRR = 0.25,
Negative pass = 1.0**. Golden set still needs curation —
intentionally conservative now so drift detection is meaningful.
- **ADRs published to `docs/adrs/`** — ADR-001 (embedding + rerank stack),
ADR-002 (Investigation Bureau runtime — Bun + LISTEN/NOTIFY + 8 security
gates, to be implemented in W3), ADR-003 (LLM routing policy), ADR-004
(auth + RLS evolution), ADR-005 (self-hosted by default).
#### Verified on `disclosure.top` (2026-05-23T21:55Z):
- `/api/search/hybrid?q=Roswell&top_k=5` → HTTP 200 in 6.7s (embed-only,
rerank skipped per default strategy)
- `/api/search/hybrid?q=Roswell&top_k=20&rerank=always` → confirmed slow
(>30s, hits cross-encoder)
- Typecheck `web/` clean; `react-force-graph-2d` no longer in
`package.json`
- `tests/rag/run.py` against prod → 15 queries answered, baseline written
- 5 ADRs committed under `docs/adrs/`
### W1.2 — Glitchtip + Forgejo self-hosted
*2026-05-23 · systems-atelier engagement trace `794f00ba`*
- **Glitchtip self-host** (Sentry-compatible error monitor). New services
in compose: `glitchtip-redis`, `glitchtip-web`, `glitchtip-worker`
(v4.2, uWSGI on 8080). Database `glitchtip` carved out of
`disclosure-db` as a separate role/DB. Bootstrap done via Django
`manage.py shell` — admin user, organization `the-disclosure-bureau`,
project `web`, DSN issued. SDK wired: `@sentry/nextjs` + `instrumentation.ts`
+ `sentry.{client,server}.config.ts`. `/api/admin/throw` smoke endpoint
is admin-gated. Live at `https://glitchtip.disclosure.top` (TLS issued
by Let's Encrypt via Traefik). Synthetic event verified — POST
`/api/1/store/` → 200 + event id.
- **Forgejo self-host + Actions CI**. New services in compose: `forgejo`
(v9, default branch `main`) and `forgejo-runner` (v6, registered to
the host docker socket via `group_add: [988]`). Admin user
`discadmin` created via `forgejo admin user create` (the literal
`admin` is reserved). Runner bootstrap on first start: registers if
`.runner` absent, then `forgejo-runner daemon`. Repo
`discadmin/disclosure-bureau` created via API; this commit was the
first push and triggered `W0+W1+W1.2: …` workflow at task 1.
- **`.forgejo/workflows/ci.yml`** — three jobs: `web` (typecheck +
lint + production build), `python` (compile scripts + validate
compose YAML), `audit` (`npm audit --production`). Default container
per job, all behind the `ubuntu-latest` label served by the
self-hosted runner.
#### Verified on the stack (2026-05-23T21:19Z):
- `glitchtip.disclosure.top` → HTTP 200, real Let's Encrypt cert,
Glitchtip CSP headers present.
- POST `/api/1/store/` → 200, event_id `cb17d723…` returned.
- `forgejo.disclosure.top` → HTTP 200, Forgejo welcome page.
- Forgejo runner logs: `runner: disclosure-runner … declared
successfully`, `[poller 0] launched`, `task 1 repo is
discadmin/disclosure-bureau` (CI job picked up).
- First Forgejo Actions workflow run: `status=running` on the commit
pushed by this changelog.
### W1 — Observability + resilience + Meili autocomplete
*2026-05-23 · systems-atelier engagement trace `794f00ba`*
- **Studio container fixed (carry-over from W0)** — root cause was Next.js
standalone binding to the container hostname only. The docker healthcheck
(`fetch 127.0.0.1:3000/api/profile`) looped on `ECONNREFUSED`, the service
never went healthy, and Traefik returned 404 because the upstream wasn't
responding. Fix: `HOSTNAME: 0.0.0.0` in the studio env. Studio now
`healthy`, basic auth from W0-F3 enforces correctly (no-auth → 401,
valid creds → 307), and Let's Encrypt issued a real cert for
`studio.disclosure.top` once the route started responding.
- **TD#10 · PG pool max** — `PG_POOL_MAX=20` (was hard-coded 5) configurable
via .env; default raised for prod. Files: `docker-compose.yml`, `.env`.
- **W1-F8 · `CLAUDE_CODE_OAUTH_TOKEN` gated** — only injected into the `web`
service when explicitly set in `CLAUDE_CODE_OAUTH_TOKEN_FOR_WEB`. Default
empty since `CHAT_PROVIDER=openrouter` does not need it. Reduces blast
radius if web container is compromised. Files: `docker-compose.yml`, `.env`.
- **TD#30 · Subprocess timeout configurable** — `CLAUDE_CODE_TIMEOUT_MS`
env now controls the `claude -p` subprocess timeout (default 90s,
matches prior hard-coded value). Files: `web/lib/chat/claude-code.ts`.
- **TD#23 · OpenRouter retry + circuit breaker** — `fetchOpenRouter()`
wraps every call with: retry up to `OPENROUTER_RETRY_MAX` (default 2)
on 408 / 425 / 429 / 500 / 502 / 503 / 504 and network errors, with
exponential backoff and `Retry-After` honored; in-memory circuit
breaker trips when `PRIMARY` fails `CB_THRESHOLD` times (default 3)
within `CB_WINDOW_MS` (60s), promoting `FALLBACK` for `CB_COOLDOWN_MS`
(2 min). Both `sendOnce` and `openrouterStreamCall` go through it.
Files: `web/lib/chat/openrouter.ts`.
- **TD#6 · Structured logging with pino** — `web/lib/logger.ts` provides
a JSON logger (NDJSON in prod, pretty in dev) plus `withRequest()`
helper for correlation-id-bound child loggers. Edge runtime falls back
to a console adapter. Middleware now mints a `correlation_id` for
every request, stamps the response header (`x-correlation-id`), and
emits one structured `http_request` line per `/api/*` call with
method, path, status, and duration. `messages/route.ts` switched to
the new logger. Files: `web/lib/logger.ts`, `web/middleware.ts`,
`web/app/api/sessions/[id]/messages/route.ts`, `web/package.json`.
- **Meilisearch indexer + `/api/search/autocomplete` + UI** — the previously
idle Meili instance now backs typo-tolerant prefix search. Indexer
script `scripts/maintain/60_meili_index.py` ingests documents
(canonical_title + collection) and is-searchable chunks (content_pt +
content_en + meta). The new `/api/search/autocomplete?q=...` route
hits both indexes in parallel with a 2s abort and returns a merged
payload. `SearchAutocomplete` React component drops a debounced
dropdown under the `/search` input. Median latency in production:
**58ms**. Files: `scripts/maintain/60_meili_index.py`,
`web/app/api/search/autocomplete/route.ts`,
`web/components/search-autocomplete.tsx`,
`web/components/search-panel.tsx`.
#### Verified on `disclosure.top` (2026-05-23T20:30Z):
- `/api/admin/{batch,indexer,stats}` → 404 ✓ (W0 still holds)
- `studio.disclosure.top` no-auth → 401 · `admin:<DASHBOARD_PASSWORD>` → 307 ✓
- Let's Encrypt cert issued for `studio.disclosure.top`
- Autocomplete `q=Roswell` → 8 chunks in 8ms; `q=Sandia` → 1 doc + 8 chunks
in 8ms; `q=1947` → 5 docs + 8 chunks in 6ms ✓
- `x-correlation-id` header present on `/api/search/hybrid` response
(e.g. `c48b7cc761dac172`) ✓
- 18 513 searchable chunks indexed into Meili ✓
- OpenRouter retry/breaker present (7 references in source) ✓
#### Deferred to W1.2 / W2 (need user-in-loop steps):
- **Glitchtip self-host** — needs DNS for `glitchtip.disclosure.top`,
initial signup-as-superuser, project DSN copied to .env. Logger and
middleware are already feeding the data; SDK wiring is one PR.
- **Forgejo Actions self-host CI** — Forgejo server + runner bootstrap,
initial admin account, repo migration / mirror. Recommend a separate
session because of the depth of setup.
### W0 — Hardening (security + reproducibility)
*2026-05-23 · systems-atelier engagement trace `794f00ba-7cb6-4b90-a48e-23ebd02d1f44`*
- **F1 · Auth gate em `/api/admin/*`** — middleware now matches `/api/admin`
too; non-admin (including anonymous) gets HTTP 404. Verified: `curl`
on `/api/admin/{batch,indexer,stats}` returns 404 publicly. Files:
`web/middleware.ts`.
- **F2 · Imgproxy filesystem root tightened** — `IMGPROXY_LOCAL_FILESYSTEM_ROOT`
moved from `/` (entire VPS root) to `/var/lib/storage` (Storage backend
mount only). Reduces blast radius of any future imgproxy CVE. Files:
`infra/disclosure-stack/docker-compose.yml`.
- **F3 · Studio basic auth label** — replaced the dead-end
`basicauth.usersfile=/dev/null` with a real bcrypt-hashed credential
(`DASHBOARD_USERNAME` / `DASHBOARD_PASSWORD` from `.env`) and wired the
middleware into the router via `disclosure-studio.middlewares=
disclosure-studio-auth@docker`. *Caveat:* the Studio container itself
has a pre-existing instability (restarts in a Next.js loop, status
`unhealthy`) so the front-end currently returns 404 from Traefik. When
Studio is stabilized (queue for W1), the basic auth will kick in. Files:
`infra/disclosure-stack/docker-compose.yml`.
- **F4 · RLS on `public.relations`** — `ENABLE ROW LEVEL SECURITY` + public
`SELECT` policy + `GRANT SELECT TO anon, authenticated`. Aligns with
every other public table. Files: `infra/supabase/migrations/0003_w0_hardening.sql`.
- **TD#2 · `is_searchable` folded into canonical migrations** — the column,
reclassification rules, partial index, and the updated `hybrid_search_chunks`
RPC (BM25 + dense, both filtered by `is_searchable`) are now in migration
`0003_w0_hardening.sql`. A clean bootstrap on a fresh VPS produces a
searchable database without any `scripts/maintain/47-48` post-hoc patches.
Files: `infra/supabase/migrations/0003_w0_hardening.sql`.
#### Verified on `disclosure.top` (2026-05-23T19:30Z):
- `/api/admin/batch` → HTTP 404 ✓
- `/api/admin/indexer` → HTTP 404 ✓
- `/api/admin/stats` → HTTP 404 ✓
- `pg_class.relrowsecurity` = `t` for chunks, documents, entities,
entity_mentions, **relations**
- `is_searchable` distribution: 18 513 searchable / 10 046 not-searchable
(35% of corpus deduplicated from results) ✓
- `/api/search/hybrid?q=Roswell` → HTTP 200, 10 hits, first `c0527`
- Studio: Traefik labels in place; container itself unhealthy (separate
issue, deferred to W1) ⚠
#### Notes for clean-install reproducibility:
- `0003_w0_hardening.sql` MUST be applied as `supabase_admin`, not
`postgres`, because public.chunks / .entities / .relations are owned by
`supabase_admin`. The migration file documents this in its header.