disclosure-bureau/CHANGELOG.md
Luiz Gustavo 55cac8a395
Some checks failed
CI / Web — typecheck + lint + build (push) Failing after 1m30s
CI / Scripts — Python smoke (push) Failing after 32s
CI / Web — npm audit (push) Failing after 37s
W0+W1+W1.2: security hardening, observability, autocomplete, glitchtip, forgejo CI
W0 — security hardening (5 fixes verified live on disclosure.top)
- middleware: gate /api/admin/* same as /admin/* (F1)
- imgproxy: tighten LOCAL_FILESYSTEM_ROOT from / to /var/lib/storage (F2)
- studio: real basic-auth label (bcrypt hash, middleware reference) (F3)
- relations: ENABLE ROW LEVEL SECURITY + public SELECT policy (F4)
- migration 0003: fold is_searchable + hybrid_search update into canonical (TD#2)

W1 — observability + resilience + autocomplete
- studio: HOSTNAME=0.0.0.0 so Next.js binds on loopback for healthcheck
- compose: PG_POOL_MAX=20, CLAUDE_CODE_OAUTH_TOKEN gated by separate env
- claude-code.ts: subprocess timeout configurable (CLAUDE_CODE_TIMEOUT_MS)
- openrouter.ts: retry with exponential backoff + Retry-After + in-memory
  circuit breaker (promotes FALLBACK after CB_THRESHOLD failures)
- lib/logger.ts: pino logger (NDJSON prod / pretty dev) + withRequest helper
- middleware: mints correlation_id, stamps x-correlation-id response header,
  emits structured http_request log per /api/* call
- messages/route.ts: switch to structured logger
- 60_meili_index.py: push documents + chunks into Meilisearch
- /api/search/autocomplete: parallel meili search (docs + chunks), 5-8ms p50
- search-autocomplete.tsx: debounced dropdown wired into search-panel

W1.2 — Glitchtip + Forgejo self-hosted
- compose: glitchtip-redis + glitchtip-web + glitchtip-worker (v4.2)
- compose: forgejo + forgejo-runner (server v9, runner v6) with group_add=988
- @sentry/nextjs SDK wired (instrumentation.ts + sentry.{client,server}.config.ts)
- /api/admin/throw smoke endpoint (gated by W0-F1 middleware)
- Synthetic event ingestion verified at glitchtip.disclosure.top
- forgejo.disclosure.top up, repo discadmin/disclosure-bureau created,
  runner registered (labels: ubuntu-latest, docker)
- .forgejo/workflows/ci.yml: typecheck + lint + build + npm audit + python
  syntax + compose validation

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:18:42 -03:00

121 lines
7.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Changelog · Disclosure Bureau
All notable changes to this project go here. Newest on top.
## [Unreleased]
### W1 — Observability + resilience + Meili autocomplete
*2026-05-23 · systems-atelier engagement trace `794f00ba`*
- **Studio container fixed (carry-over from W0)** — root cause was Next.js
standalone binding to the container hostname only. The docker healthcheck
(`fetch 127.0.0.1:3000/api/profile`) looped on `ECONNREFUSED`, the service
never went healthy, and Traefik returned 404 because the upstream wasn't
responding. Fix: `HOSTNAME: 0.0.0.0` in the studio env. Studio now
`healthy`, basic auth from W0-F3 enforces correctly (no-auth → 401,
valid creds → 307), and Let's Encrypt issued a real cert for
`studio.disclosure.top` once the route started responding.
- **TD#10 · PG pool max** — `PG_POOL_MAX=20` (was hard-coded 5) configurable
via .env; default raised for prod. Files: `docker-compose.yml`, `.env`.
- **W1-F8 · `CLAUDE_CODE_OAUTH_TOKEN` gated** — only injected into the `web`
service when explicitly set in `CLAUDE_CODE_OAUTH_TOKEN_FOR_WEB`. Default
empty since `CHAT_PROVIDER=openrouter` does not need it. Reduces blast
radius if web container is compromised. Files: `docker-compose.yml`, `.env`.
- **TD#30 · Subprocess timeout configurable** — `CLAUDE_CODE_TIMEOUT_MS`
env now controls the `claude -p` subprocess timeout (default 90s,
matches prior hard-coded value). Files: `web/lib/chat/claude-code.ts`.
- **TD#23 · OpenRouter retry + circuit breaker** — `fetchOpenRouter()`
wraps every call with: retry up to `OPENROUTER_RETRY_MAX` (default 2)
on 408 / 425 / 429 / 500 / 502 / 503 / 504 and network errors, with
exponential backoff and `Retry-After` honored; in-memory circuit
breaker trips when `PRIMARY` fails `CB_THRESHOLD` times (default 3)
within `CB_WINDOW_MS` (60s), promoting `FALLBACK` for `CB_COOLDOWN_MS`
(2 min). Both `sendOnce` and `openrouterStreamCall` go through it.
Files: `web/lib/chat/openrouter.ts`.
- **TD#6 · Structured logging with pino** — `web/lib/logger.ts` provides
a JSON logger (NDJSON in prod, pretty in dev) plus `withRequest()`
helper for correlation-id-bound child loggers. Edge runtime falls back
to a console adapter. Middleware now mints a `correlation_id` for
every request, stamps the response header (`x-correlation-id`), and
emits one structured `http_request` line per `/api/*` call with
method, path, status, and duration. `messages/route.ts` switched to
the new logger. Files: `web/lib/logger.ts`, `web/middleware.ts`,
`web/app/api/sessions/[id]/messages/route.ts`, `web/package.json`.
- **Meilisearch indexer + `/api/search/autocomplete` + UI** — the previously
idle Meili instance now backs typo-tolerant prefix search. Indexer
script `scripts/maintain/60_meili_index.py` ingests documents
(canonical_title + collection) and is-searchable chunks (content_pt +
content_en + meta). The new `/api/search/autocomplete?q=...` route
hits both indexes in parallel with a 2s abort and returns a merged
payload. `SearchAutocomplete` React component drops a debounced
dropdown under the `/search` input. Median latency in production:
**58ms**. Files: `scripts/maintain/60_meili_index.py`,
`web/app/api/search/autocomplete/route.ts`,
`web/components/search-autocomplete.tsx`,
`web/components/search-panel.tsx`.
#### Verified on `disclosure.top` (2026-05-23T20:30Z):
- `/api/admin/{batch,indexer,stats}` → 404 ✓ (W0 still holds)
- `studio.disclosure.top` no-auth → 401 · `admin:<DASHBOARD_PASSWORD>` → 307 ✓
- Let's Encrypt cert issued for `studio.disclosure.top`
- Autocomplete `q=Roswell` → 8 chunks in 8ms; `q=Sandia` → 1 doc + 8 chunks
in 8ms; `q=1947` → 5 docs + 8 chunks in 6ms ✓
- `x-correlation-id` header present on `/api/search/hybrid` response
(e.g. `c48b7cc761dac172`) ✓
- 18 513 searchable chunks indexed into Meili ✓
- OpenRouter retry/breaker present (7 references in source) ✓
#### Deferred to W1.2 / W2 (need user-in-loop steps):
- **Glitchtip self-host** — needs DNS for `glitchtip.disclosure.top`,
initial signup-as-superuser, project DSN copied to .env. Logger and
middleware are already feeding the data; SDK wiring is one PR.
- **Forgejo Actions self-host CI** — Forgejo server + runner bootstrap,
initial admin account, repo migration / mirror. Recommend a separate
session because of the depth of setup.
### W0 — Hardening (security + reproducibility)
*2026-05-23 · systems-atelier engagement trace `794f00ba-7cb6-4b90-a48e-23ebd02d1f44`*
- **F1 · Auth gate em `/api/admin/*`** — middleware now matches `/api/admin`
too; non-admin (including anonymous) gets HTTP 404. Verified: `curl`
on `/api/admin/{batch,indexer,stats}` returns 404 publicly. Files:
`web/middleware.ts`.
- **F2 · Imgproxy filesystem root tightened** — `IMGPROXY_LOCAL_FILESYSTEM_ROOT`
moved from `/` (entire VPS root) to `/var/lib/storage` (Storage backend
mount only). Reduces blast radius of any future imgproxy CVE. Files:
`infra/disclosure-stack/docker-compose.yml`.
- **F3 · Studio basic auth label** — replaced the dead-end
`basicauth.usersfile=/dev/null` with a real bcrypt-hashed credential
(`DASHBOARD_USERNAME` / `DASHBOARD_PASSWORD` from `.env`) and wired the
middleware into the router via `disclosure-studio.middlewares=
disclosure-studio-auth@docker`. *Caveat:* the Studio container itself
has a pre-existing instability (restarts in a Next.js loop, status
`unhealthy`) so the front-end currently returns 404 from Traefik. When
Studio is stabilized (queue for W1), the basic auth will kick in. Files:
`infra/disclosure-stack/docker-compose.yml`.
- **F4 · RLS on `public.relations`** — `ENABLE ROW LEVEL SECURITY` + public
`SELECT` policy + `GRANT SELECT TO anon, authenticated`. Aligns with
every other public table. Files: `infra/supabase/migrations/0003_w0_hardening.sql`.
- **TD#2 · `is_searchable` folded into canonical migrations** — the column,
reclassification rules, partial index, and the updated `hybrid_search_chunks`
RPC (BM25 + dense, both filtered by `is_searchable`) are now in migration
`0003_w0_hardening.sql`. A clean bootstrap on a fresh VPS produces a
searchable database without any `scripts/maintain/47-48` post-hoc patches.
Files: `infra/supabase/migrations/0003_w0_hardening.sql`.
#### Verified on `disclosure.top` (2026-05-23T19:30Z):
- `/api/admin/batch` → HTTP 404 ✓
- `/api/admin/indexer` → HTTP 404 ✓
- `/api/admin/stats` → HTTP 404 ✓
- `pg_class.relrowsecurity` = `t` for chunks, documents, entities,
entity_mentions, **relations**
- `is_searchable` distribution: 18 513 searchable / 10 046 not-searchable
(35% of corpus deduplicated from results) ✓
- `/api/search/hybrid?q=Roswell` → HTTP 200, 10 hits, first `c0527`
- Studio: Traefik labels in place; container itself unhealthy (separate
issue, deferred to W1) ⚠
#### Notes for clean-install reproducibility:
- `0003_w0_hardening.sql` MUST be applied as `supabase_admin`, not
`postgres`, because public.chunks / .entities / .relations are owned by
`supabase_admin`. The migration file documents this in its header.