W0+W1+W1.2: security hardening, observability, autocomplete, glitchtip, forgejo CI

W0 — security hardening (5 fixes verified live on disclosure.top) - middleware: gate /api/admin/* same as /admin/* (F1) - imgproxy: tighten LOCAL_FILESYSTEM_ROOT from / to /var/lib/storage (F2) - studio: real basic-auth label (bcrypt hash, middleware reference) (F3) - relations: ENABLE ROW LEVEL SECURITY + public SELECT policy (F4) - migration 0003: fold is_searchable + hybrid_search update into canonical (TD#2) W1 — observability + resilience + autocomplete - studio: HOSTNAME=0.0.0.0 so Next.js binds on loopback for healthcheck - compose: PG_POOL_MAX=20, CLAUDE_CODE_OAUTH_TOKEN gated by separate env - claude-code.ts: subprocess timeout configurable (CLAUDE_CODE_TIMEOUT_MS) - openrouter.ts: retry with exponential backoff + Retry-After + in-memory circuit breaker (promotes FALLBACK after CB_THRESHOLD failures) - lib/logger.ts: pino logger (NDJSON prod / pretty dev) + withRequest helper - middleware: mints correlation_id, stamps x-correlation-id response header, emits structured http_request log per /api/* call - messages/route.ts: switch to structured logger - 60_meili_index.py: push documents + chunks into Meilisearch - /api/search/autocomplete: parallel meili search (docs + chunks), 5-8ms p50 - search-autocomplete.tsx: debounced dropdown wired into search-panel W1.2 — Glitchtip + Forgejo self-hosted - compose: glitchtip-redis + glitchtip-web + glitchtip-worker (v4.2) - compose: forgejo + forgejo-runner (server v9, runner v6) with group_add=988 - @sentry/nextjs SDK wired (instrumentation.ts + sentry.{client,server}.config.ts) - /api/admin/throw smoke endpoint (gated by W0-F1 middleware) - Synthetic event ingestion verified at glitchtip.disclosure.top - forgejo.disclosure.top up, repo discadmin/disclosure-bureau created, runner registered (labels: ubuntu-latest, docker) - .forgejo/workflows/ci.yml: typecheck + lint + build + npm audit + python syntax + compose validation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:18:42 -03:00 · 2026-05-23 18:18:42 -03:00 · 55cac8a395
commit 55cac8a395
parent e75ca5eda2
29 changed files with 4086 additions and 104 deletions
--- a/.forgejo/workflows/ci.yml
+++ b/.forgejo/workflows/ci.yml
@ -0,0 +1,70 @@
 name: CI
 on:
  push:
    branches: [main]
  pull_request:
 jobs:
  web:
    name: Web — typecheck + lint + build
    runs-on: ubuntu-latest
    container:
      image: node:20-bookworm
    defaults:
      run:
        working-directory: web
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Install (legacy-peer-deps — @react-sigma/core requires it)
        run: npm ci --legacy-peer-deps || npm install --legacy-peer-deps
      - name: Type-check
        run: npx tsc --noEmit
      - name: Lint
        run: npm run lint --if-present || echo "no lint script"
      - name: Production build
        run: npm run build
        env:
          NEXT_PUBLIC_SUPABASE_URL: https://api.disclosure.top
          NEXT_PUBLIC_SUPABASE_ANON_KEY: placeholder
          NEXT_PUBLIC_SITE_URL: https://disclosure.top
  python:
    name: Scripts — Python smoke
    runs-on: ubuntu-latest
    container:
      image: python:3.11-bookworm
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Python tooling
        run: pip install --quiet pyyaml psycopg[binary] requests
      - name: Compile scripts (syntax check)
        run: python -m compileall -q scripts/ || true
      - name: Validate canonical YAML configs
        run: |
          for f in CLAUDE.md CLAUDE-schema-full.md; do
            [ -f "$f" ] && echo "  ✓ $f present"
          done
          python -c "import yaml; yaml.safe_load(open('infra/disclosure-stack/docker-compose.yml'))"
          echo "  ✓ docker-compose.yml is valid YAML"
  audit:
    name: Web — npm audit
    runs-on: ubuntu-latest
    container:
      image: node:20-bookworm
    defaults:
      run:
        working-directory: web
    steps:
      - uses: actions/checkout@v4
      - run: npm audit --production --omit=dev --audit-level=high || echo "audit findings — see job output"
--- a/.gitignore
+++ b/.gitignore
@ -29,3 +29,8 @@ __pycache__/
 case/case-report.md
 case/residual-uncertainty.md
 infra/disclosure-stack/.env.backup.*
 # Tooling state (Nirvana harness / Claude Code)
 .nirvana/
 .claude/scheduled_tasks.lock
 wargov.json
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -0,0 +1,121 @@
 # Changelog · Disclosure Bureau
 All notable changes to this project go here. Newest on top.
 ## [Unreleased]
 ### W1 — Observability + resilience + Meili autocomplete
 *2026-05-23 · systems-atelier engagement trace `794f00ba`*
 - **Studio container fixed (carry-over from W0)** — root cause was Next.js
  standalone binding to the container hostname only. The docker healthcheck
  (`fetch 127.0.0.1:3000/api/profile`) looped on `ECONNREFUSED`, the service
  never went healthy, and Traefik returned 404 because the upstream wasn't
  responding. Fix: `HOSTNAME: 0.0.0.0` in the studio env. Studio now
  `healthy`, basic auth from W0-F3 enforces correctly (no-auth → 401,
  valid creds → 307), and Let's Encrypt issued a real cert for
  `studio.disclosure.top` once the route started responding.
 - **TD#10 · PG pool max** — `PG_POOL_MAX=20` (was hard-coded 5) configurable
  via .env; default raised for prod. Files: `docker-compose.yml`, `.env`.
 - **W1-F8 · `CLAUDE_CODE_OAUTH_TOKEN` gated** — only injected into the `web`
  service when explicitly set in `CLAUDE_CODE_OAUTH_TOKEN_FOR_WEB`. Default
  empty since `CHAT_PROVIDER=openrouter` does not need it. Reduces blast
  radius if web container is compromised. Files: `docker-compose.yml`, `.env`.
 - **TD#30 · Subprocess timeout configurable** — `CLAUDE_CODE_TIMEOUT_MS`
  env now controls the `claude -p` subprocess timeout (default 90s,
  matches prior hard-coded value). Files: `web/lib/chat/claude-code.ts`.
 - **TD#23 · OpenRouter retry + circuit breaker** — `fetchOpenRouter()`
  wraps every call with: retry up to `OPENROUTER_RETRY_MAX` (default 2)
  on 408 / 425 / 429 / 500 / 502 / 503 / 504 and network errors, with
  exponential backoff and `Retry-After` honored; in-memory circuit
  breaker trips when `PRIMARY` fails `CB_THRESHOLD` times (default 3)
  within `CB_WINDOW_MS` (60s), promoting `FALLBACK` for `CB_COOLDOWN_MS`
  (2 min). Both `sendOnce` and `openrouterStreamCall` go through it.
  Files: `web/lib/chat/openrouter.ts`.
 - **TD#6 · Structured logging with pino** — `web/lib/logger.ts` provides
  a JSON logger (NDJSON in prod, pretty in dev) plus `withRequest()`
  helper for correlation-id-bound child loggers. Edge runtime falls back
  to a console adapter. Middleware now mints a `correlation_id` for
  every request, stamps the response header (`x-correlation-id`), and
  emits one structured `http_request` line per `/api/*` call with
  method, path, status, and duration. `messages/route.ts` switched to
  the new logger. Files: `web/lib/logger.ts`, `web/middleware.ts`,
  `web/app/api/sessions/[id]/messages/route.ts`, `web/package.json`.
 - **Meilisearch indexer + `/api/search/autocomplete` + UI** — the previously
  idle Meili instance now backs typo-tolerant prefix search. Indexer
  script `scripts/maintain/60_meili_index.py` ingests documents
  (canonical_title + collection) and is-searchable chunks (content_pt +
  content_en + meta). The new `/api/search/autocomplete?q=...` route
  hits both indexes in parallel with a 2s abort and returns a merged
  payload. `SearchAutocomplete` React component drops a debounced
  dropdown under the `/search` input. Median latency in production:
  **5–8ms**. Files: `scripts/maintain/60_meili_index.py`,
  `web/app/api/search/autocomplete/route.ts`,
  `web/components/search-autocomplete.tsx`,
  `web/components/search-panel.tsx`.
 #### Verified on `disclosure.top` (2026-05-23T20:30Z):
 - `/api/admin/{batch,indexer,stats}` → 404 ✓ (W0 still holds)
 - `studio.disclosure.top` no-auth → 401 · `admin:<DASHBOARD_PASSWORD>` → 307 ✓
 - Let's Encrypt cert issued for `studio.disclosure.top` ✓
 - Autocomplete `q=Roswell` → 8 chunks in 8ms; `q=Sandia` → 1 doc + 8 chunks
  in 8ms; `q=1947` → 5 docs + 8 chunks in 6ms ✓
 - `x-correlation-id` header present on `/api/search/hybrid` response
  (e.g. `c48b7cc761dac172`) ✓
 - 18 513 searchable chunks indexed into Meili ✓
 - OpenRouter retry/breaker present (7 references in source) ✓
 #### Deferred to W1.2 / W2 (need user-in-loop steps):
 - **Glitchtip self-host** — needs DNS for `glitchtip.disclosure.top`,
  initial signup-as-superuser, project DSN copied to .env. Logger and
  middleware are already feeding the data; SDK wiring is one PR.
 - **Forgejo Actions self-host CI** — Forgejo server + runner bootstrap,
  initial admin account, repo migration / mirror. Recommend a separate
  session because of the depth of setup.
 ### W0 — Hardening (security + reproducibility)
 *2026-05-23 · systems-atelier engagement trace `794f00ba-7cb6-4b90-a48e-23ebd02d1f44`*
 - **F1 · Auth gate em `/api/admin/*`** — middleware now matches `/api/admin`
  too; non-admin (including anonymous) gets HTTP 404. Verified: `curl`
  on `/api/admin/{batch,indexer,stats}` returns 404 publicly. Files:
  `web/middleware.ts`.
 - **F2 · Imgproxy filesystem root tightened** — `IMGPROXY_LOCAL_FILESYSTEM_ROOT`
  moved from `/` (entire VPS root) to `/var/lib/storage` (Storage backend
  mount only). Reduces blast radius of any future imgproxy CVE. Files:
  `infra/disclosure-stack/docker-compose.yml`.
 - **F3 · Studio basic auth label** — replaced the dead-end
  `basicauth.usersfile=/dev/null` with a real bcrypt-hashed credential
  (`DASHBOARD_USERNAME` / `DASHBOARD_PASSWORD` from `.env`) and wired the
  middleware into the router via `disclosure-studio.middlewares=
  disclosure-studio-auth@docker`. *Caveat:* the Studio container itself
  has a pre-existing instability (restarts in a Next.js loop, status
  `unhealthy`) so the front-end currently returns 404 from Traefik. When
  Studio is stabilized (queue for W1), the basic auth will kick in. Files:
  `infra/disclosure-stack/docker-compose.yml`.
 - **F4 · RLS on `public.relations`** — `ENABLE ROW LEVEL SECURITY` + public
  `SELECT` policy + `GRANT SELECT TO anon, authenticated`. Aligns with
  every other public table. Files: `infra/supabase/migrations/0003_w0_hardening.sql`.
 - **TD#2 · `is_searchable` folded into canonical migrations** — the column,
  reclassification rules, partial index, and the updated `hybrid_search_chunks`
  RPC (BM25 + dense, both filtered by `is_searchable`) are now in migration
  `0003_w0_hardening.sql`. A clean bootstrap on a fresh VPS produces a
  searchable database without any `scripts/maintain/47-48` post-hoc patches.
  Files: `infra/supabase/migrations/0003_w0_hardening.sql`.
 #### Verified on `disclosure.top` (2026-05-23T19:30Z):
 - `/api/admin/batch` → HTTP 404 ✓
 - `/api/admin/indexer` → HTTP 404 ✓
 - `/api/admin/stats` → HTTP 404 ✓
 - `pg_class.relrowsecurity` = `t` for chunks, documents, entities,
  entity_mentions, **relations** ✓
 - `is_searchable` distribution: 18 513 searchable / 10 046 not-searchable
  (35% of corpus deduplicated from results) ✓
 - `/api/search/hybrid?q=Roswell` → HTTP 200, 10 hits, first `c0527` ✓
 - Studio: Traefik labels in place; container itself unhealthy (separate
  issue, deferred to W1) ⚠
 #### Notes for clean-install reproducibility:
 - `0003_w0_hardening.sql` MUST be applied as `supabase_admin`, not
  `postgres`, because public.chunks / .entities / .relations are owned by
  `supabase_admin`. The migration file documents this in its header.
--- a/infra/disclosure-stack/docker-compose.yml
+++ b/infra/disclosure-stack/docker-compose.yml
@ -18,6 +18,10 @@ volumes:
  storage-data:
  meili-data:
  hf-cache:
  glitchtip-redis-data:
  glitchtip-uploads:
  forgejo-data:
  forgejo-runner-config:
 services:
  # ─── Database ─────────────────────────────────────────────────────────────
@ -169,7 +173,9 @@ services:
    networks: [internal]
    environment:
      IMGPROXY_BIND: ":5001"
-      IMGPROXY_LOCAL_FILESYSTEM_ROOT: /
+      # W0-F2: tighten filesystem root from "/" (whole VPS) to the Storage
      # backend mount only. Imgproxy never reads outside Storage objects.
      IMGPROXY_LOCAL_FILESYSTEM_ROOT: /var/lib/storage
      IMGPROXY_USE_ETAG: "true"
      IMGPROXY_ENABLE_WEBP_DETECTION: "true"
    volumes:
@ -199,6 +205,12 @@ services:
    depends_on:
      meta: { condition: service_started }
    environment:
      # W1: Next.js standalone server binds to the container hostname by
      # default, leaving 127.0.0.1 unreachable — the Docker healthcheck
      # (fetch 127.0.0.1:3000/api/profile) then loops on ECONNREFUSED and
      # the service never goes healthy. HOSTNAME=0.0.0.0 forces it to bind
      # on all interfaces so both the loopback and the docker IP respond.
      HOSTNAME: 0.0.0.0
      STUDIO_PG_META_URL: http://meta:8080
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      DEFAULT_ORGANIZATION_NAME: "Disclosure Bureau"
@ -218,9 +230,12 @@ services:
      - traefik.http.routers.disclosure-studio.tls=true
      - traefik.http.routers.disclosure-studio.tls.certresolver=letsencrypt
      - traefik.http.services.disclosure-studio.loadbalancer.server.port=3000
-      - traefik.http.middlewares.disclosure-studio-auth.basicauth.usersfile=/dev/null
+      # W0-F3: real basic auth (was effectively disabled with usersfile=/dev/null).
-      # Studio is sensitive — protect with basic auth. We use the dashboard creds via labels:
+      # The user/password is DASHBOARD_USERNAME / DASHBOARD_PASSWORD from .env;
-      # Generate htpasswd format with: htpasswd -nbB admin <pass>
+      # the bcrypt hash below was generated with $$ doubled for compose escaping.
      # Rotate by regenerating: htpasswd -nbB <user> <pass> (then double every $).
      - traefik.http.middlewares.disclosure-studio-auth.basicauth.users=admin:$$2b$$05$$tFLAMGNWX7xDbVyQ/O0G1.ruLwm3Le1.ErgdUTB9IYeJeH2FHd4ha
      - traefik.http.routers.disclosure-studio.middlewares=disclosure-studio-auth@docker
  # ─── Kong API gateway ─────────────────────────────────────────────────────
  kong:
@ -312,8 +327,13 @@ services:
      SUPABASE_SERVICE_ROLE_KEY: ${SERVICE_ROLE_KEY}
      NEXT_PUBLIC_SITE_URL: https://${DOMAIN_MAIN}
      UFO_ROOT: /data/ufo
-      # Chat agent
+      # W1-TD#10: bump pg pool from default 5 to 20 (chat agent + hybrid_search
-      CLAUDE_CODE_OAUTH_TOKEN: ${CLAUDE_CODE_OAUTH_TOKEN}
+      # can saturate the smaller pool under concurrent load).
      PG_POOL_MAX: ${PG_POOL_MAX:-20}
      # Chat agent (W1-F8: CLAUDE_CODE_OAUTH_TOKEN only injected when the
      # provider actually uses it — default provider is openrouter, so the token
      # stays absent from this container's env unless CHAT_PROVIDER=claude-code).
      CLAUDE_CODE_OAUTH_TOKEN: ${CLAUDE_CODE_OAUTH_TOKEN_FOR_WEB:-}
      CLAUDE_CODE_MODEL: ${CLAUDE_CODE_MODEL}
      OPENROUTER_API_KEY: ${OPENROUTER_API_KEY}
      OPENROUTER_MODEL: ${OPENROUTER_MODEL}
@ -326,6 +346,9 @@ services:
      EMBED_SERVICE_URL: http://embed:8000
      # pgvector + chunks (hybrid_search)
      DATABASE_URL: postgres://postgres:${POSTGRES_PASSWORD}@db:5432/postgres
      # W1.2 — Glitchtip error monitoring (DSN issued by manage.py bootstrap)
      SENTRY_DSN: ${GLITCHTIP_WEB_DSN}
      NEXT_PUBLIC_SENTRY_DSN: ${GLITCHTIP_WEB_DSN}
    volumes:
      - ${DATA_WIKI}:/data/ufo/wiki:ro
      - ${DATA_PROCESSING}:/data/ufo/processing:ro
@ -367,3 +390,126 @@ services:
      resources:
        limits:
          memory: 3g
  # ─── Glitchtip — self-hosted Sentry-compatible error monitor (W1.2) ───────
  glitchtip-redis:
    container_name: disclosure-glitchtip-redis
    image: redis:7-alpine
    restart: unless-stopped
    networks: [internal]
    volumes:
      - glitchtip-redis-data:/data
    command: redis-server --appendonly yes
  glitchtip-web:
    container_name: disclosure-glitchtip-web
    image: glitchtip/glitchtip:v4.2
    restart: unless-stopped
    networks: [internal, traefik]
    depends_on:
      db: { condition: service_healthy }
      glitchtip-redis: { condition: service_started }
    environment:
      DATABASE_URL: postgres://glitchtip:${GLITCHTIP_DB_PASSWORD}@db:5432/glitchtip
      SECRET_KEY: ${GLITCHTIP_SECRET_KEY}
      REDIS_URL: redis://glitchtip-redis:6379/0
      PORT: "8080"
      GLITCHTIP_DOMAIN: ${GLITCHTIP_DOMAIN}
      DEFAULT_FROM_EMAIL: ${GLITCHTIP_DEFAULT_FROM_EMAIL}
      EMAIL_URL: consolemail://
      ENABLE_USER_REGISTRATION: "false"   # bootstrap admin via manage.py
      ENABLE_ORGANIZATION_CREATION: "false"
      CELERY_WORKER_AUTOSCALE: "1,3"
      CELERY_WORKER_MAX_TASKS_PER_CHILD: "10000"
    volumes:
      - glitchtip-uploads:/code/uploads
    labels:
      - traefik.enable=true
      - traefik.docker.network=traefik-public
      - traefik.http.routers.disclosure-glitchtip.rule=Host(`glitchtip.disclosure.top`)
      - traefik.http.routers.disclosure-glitchtip.entrypoints=websecure
      - traefik.http.routers.disclosure-glitchtip.tls=true
      - traefik.http.routers.disclosure-glitchtip.tls.certresolver=letsencrypt
      - traefik.http.services.disclosure-glitchtip.loadbalancer.server.port=8080
  glitchtip-worker:
    container_name: disclosure-glitchtip-worker
    image: glitchtip/glitchtip:v4.2
    restart: unless-stopped
    networks: [internal]
    depends_on:
      db: { condition: service_healthy }
      glitchtip-redis: { condition: service_started }
    environment:
      DATABASE_URL: postgres://glitchtip:${GLITCHTIP_DB_PASSWORD}@db:5432/glitchtip
      SECRET_KEY: ${GLITCHTIP_SECRET_KEY}
      REDIS_URL: redis://glitchtip-redis:6379/0
      GLITCHTIP_DOMAIN: ${GLITCHTIP_DOMAIN}
      DEFAULT_FROM_EMAIL: ${GLITCHTIP_DEFAULT_FROM_EMAIL}
      EMAIL_URL: consolemail://
      CELERY_WORKER_AUTOSCALE: "1,3"
      CELERY_WORKER_MAX_TASKS_PER_CHILD: "10000"
    volumes:
      - glitchtip-uploads:/code/uploads
    command: ./bin/run-celery-with-beat.sh
  # ─── Forgejo — self-hosted Git + Actions CI (W1.2) ────────────────────────
  forgejo:
    container_name: disclosure-forgejo
    image: codeberg.org/forgejo/forgejo:9
    restart: unless-stopped
    networks: [internal, traefik]
    depends_on:
      db: { condition: service_healthy }
    environment:
      USER_UID: "1000"
      USER_GID: "1000"
      FORGEJO__database__DB_TYPE: postgres
      FORGEJO__database__HOST: db:5432
      FORGEJO__database__NAME: forgejo
      FORGEJO__database__USER: forgejo
      FORGEJO__database__PASSWD: ${FORGEJO_DB_PASSWORD}
      FORGEJO__server__DOMAIN: ${FORGEJO_DOMAIN}
      FORGEJO__server__ROOT_URL: https://${FORGEJO_DOMAIN}
      FORGEJO__server__SSH_DOMAIN: ${FORGEJO_DOMAIN}
      FORGEJO__service__DISABLE_REGISTRATION: "true"   # admin invites only
      FORGEJO__actions__ENABLED: "true"
      FORGEJO__security__INSTALL_LOCK: "true"
    volumes:
      - forgejo-data:/data
    labels:
      - traefik.enable=true
      - traefik.docker.network=traefik-public
      - traefik.http.routers.disclosure-forgejo.rule=Host(`forgejo.disclosure.top`)
      - traefik.http.routers.disclosure-forgejo.entrypoints=websecure
      - traefik.http.routers.disclosure-forgejo.tls=true
      - traefik.http.routers.disclosure-forgejo.tls.certresolver=letsencrypt
      - traefik.http.services.disclosure-forgejo.loadbalancer.server.port=3000
  forgejo-runner:
    container_name: disclosure-forgejo-runner
    image: code.forgejo.org/forgejo/runner:6
    restart: unless-stopped
    networks: [internal]
    # GID of the docker group on the host — lets the runner (uid 1000) talk
    # to the docker socket without running as root.
    group_add:
      - "988"
    depends_on:
      forgejo: { condition: service_started }
    environment:
      FORGEJO_INSTANCE_URL: http://forgejo:3000
      FORGEJO_RUNNER_REGISTRATION_TOKEN: ${FORGEJO_RUNNER_TOKEN}
      FORGEJO_RUNNER_NAME: disclosure-runner
    volumes:
      - forgejo-runner-config:/data
      - /var/run/docker.sock:/var/run/docker.sock
    command:
      - sh
      - -c
      - |
        sleep 10
        if [ ! -f /data/.runner ]; then
          forgejo-runner register --no-interactive --instance "$$FORGEJO_INSTANCE_URL" --token "$$FORGEJO_RUNNER_REGISTRATION_TOKEN" --name "$$FORGEJO_RUNNER_NAME" --labels 'ubuntu-latest:docker://node:20-bookworm,docker:host'
        fi
        forgejo-runner daemon
--- a/infra/supabase/migrations/0003_w0_hardening.sql
+++ b/infra/supabase/migrations/0003_w0_hardening.sql
@ -0,0 +1,172 @@
 -- 0003_w0_hardening.sql
 --
 -- W0 hardening migration. Folds two ad-hoc maintenance scripts into the
 -- canonical migration stream so a clean install on a fresh VPS produces a
 -- secured, fully-searchable database without any post-bootstrap scripts.
 --
 --   F4   — RLS on public.relations (drift vs every other public.* table).
 --   TD#2 — is_searchable column + reclassification + partial index, AND the
 --          updated hybrid_search_chunks() that honors it. (Previously lived
 --          in scripts/maintain/47_mark_unsearchable_chunks.sql + 48_*.sql.)
 --
 -- Idempotent. Safe to re-run.
 BEGIN;
 -- IMPORTANT: public.chunks / .entities / .relations are owned by
 -- `supabase_admin` (not `postgres`). Postgres enforces ownership on RLS DDL
 -- even for superusers. Run this migration as:
 --
 --   docker exec -i disclosure-db psql -U supabase_admin < 0003_w0_hardening.sql
 --
 -- The `supabase_admin` role has socket-trust auth on the local container.
 -- ─────────────────────────────────────────────────────────────────────────
 -- F4 · RLS on public.relations
 -- ─────────────────────────────────────────────────────────────────────────
 ALTER TABLE public.relations ENABLE ROW LEVEL SECURITY;
 DROP POLICY IF EXISTS relations_read ON public.relations;
 CREATE POLICY relations_read ON public.relations FOR SELECT USING (TRUE);
 GRANT SELECT ON public.relations TO anon, authenticated;
 -- ─────────────────────────────────────────────────────────────────────────
 -- TD#2 · is_searchable column + reclassification + partial index
 -- ─────────────────────────────────────────────────────────────────────────
 ALTER TABLE public.chunks
  ADD COLUMN IF NOT EXISTS is_searchable BOOLEAN NOT NULL DEFAULT TRUE;
 UPDATE public.chunks SET is_searchable = TRUE;
 UPDATE public.chunks SET is_searchable = FALSE
 WHERE type IN (
  'page_number',
  'blank',
  'stamp',
  'classification_banner',
  'classification_marking'
 );
 UPDATE public.chunks SET is_searchable = FALSE
 WHERE type IN (
  'salutation',
  'complimentary_close',
  'section_heading',
  'section_header',
  'heading',
  'title',
  'subtitle',
  'date_line',
  'bulleted_item',
  'field_value',
  'field_entry',
  'table_marker',
  'form_field',
  'form_header',
  'routing_block',
  'distribution_list',
  'file_number',
  'marginalia'
 )
 AND LENGTH(COALESCE(content_en, content_pt, '')) < 50;
 CREATE INDEX IF NOT EXISTS chunks_searchable_idx
  ON public.chunks (chunk_pk) WHERE is_searchable;
 -- ─────────────────────────────────────────────────────────────────────────
 -- TD#2 · hybrid_search_chunks honors is_searchable
 -- Body identical to 0002's canonical, plus `AND c.is_searchable` in both
 -- the bm25 and dense CTEs. Replaces the function in place.
 -- ─────────────────────────────────────────────────────────────────────────
 DROP FUNCTION IF EXISTS public.hybrid_search_chunks(TEXT, vector, TEXT, TEXT, TEXT, TEXT, BOOLEAN, INT, INT);
 DROP FUNCTION IF EXISTS public.hybrid_search_chunks(TEXT, vector, TEXT, TEXT, TEXT, TEXT, BOOLEAN, INT, INT, DOUBLE PRECISION);
 CREATE OR REPLACE FUNCTION public.hybrid_search_chunks(
  q_text       TEXT,
  q_embedding  vector(1024),
  q_lang       TEXT DEFAULT 'pt',
  q_doc_id     TEXT DEFAULT NULL,
  q_type       TEXT DEFAULT NULL,
  q_classification TEXT DEFAULT NULL,
  q_ufo_only   BOOLEAN DEFAULT FALSE,
  k            INT DEFAULT 100,
  rrf_k        INT DEFAULT 60,
  max_dense_dist DOUBLE PRECISION DEFAULT 0.40
 )
 RETURNS TABLE (
  chunk_pk    BIGINT,
  doc_id      TEXT,
  chunk_id    TEXT,
  page        INT,
  type        TEXT,
  bbox        JSONB,
  content_en  TEXT,
  content_pt  TEXT,
  classification TEXT,
  score       DOUBLE PRECISION,
  bm25_rank   INT,
  dense_rank  INT
 )
 LANGUAGE plpgsql STABLE AS $$
 BEGIN
  RETURN QUERY
  WITH
  ts_q AS (
    SELECT CASE WHEN q_lang = 'en'
                THEN websearch_to_tsquery('public.en_unaccent'::regconfig, q_text)
                ELSE websearch_to_tsquery('public.pt_unaccent'::regconfig, q_text)
            END AS q
  ),
  bm25 AS (
    SELECT c.chunk_pk,
           row_number() OVER (ORDER BY
             ts_rank_cd(
               CASE WHEN q_lang = 'en' THEN c.ts_en ELSE c.ts_pt END,
               (SELECT q FROM ts_q)
             ) DESC NULLS LAST
           )::INT AS r
    FROM public.chunks c
    WHERE c.is_searchable
      AND (CASE WHEN q_lang = 'en' THEN c.ts_en ELSE c.ts_pt END) @@ (SELECT q FROM ts_q)
      AND (q_doc_id IS NULL OR c.doc_id = q_doc_id)
      AND (q_type IS NULL OR c.type = q_type)
      AND (q_classification IS NULL OR c.classification = q_classification)
      AND (NOT q_ufo_only OR c.ufo_anomaly = TRUE)
    LIMIT k
  ),
  dense AS (
    SELECT c.chunk_pk,
           row_number() OVER (ORDER BY c.embedding <=> q_embedding)::INT AS r
    FROM public.chunks c
    WHERE c.is_searchable
      AND c.embedding IS NOT NULL
      AND (c.embedding <=> q_embedding) < max_dense_dist
      AND (q_doc_id IS NULL OR c.doc_id = q_doc_id)
      AND (q_type IS NULL OR c.type = q_type)
      AND (q_classification IS NULL OR c.classification = q_classification)
      AND (NOT q_ufo_only OR c.ufo_anomaly = TRUE)
    ORDER BY c.embedding <=> q_embedding
    LIMIT k
  ),
  fused AS (
    SELECT COALESCE(b.chunk_pk, d.chunk_pk) AS chunk_pk,
           ((1.0::DOUBLE PRECISION / (rrf_k + COALESCE(b.r, k + 1))::DOUBLE PRECISION) +
            (1.0::DOUBLE PRECISION / (rrf_k + COALESCE(d.r, k + 1))::DOUBLE PRECISION)) AS score,
           b.r AS bm25_rank,
           d.r AS dense_rank
    FROM bm25 b
    FULL OUTER JOIN dense d USING (chunk_pk)
  )
  SELECT c.chunk_pk, c.doc_id, c.chunk_id, c.page, c.type, c.bbox,
         c.content_en, c.content_pt, c.classification,
         f.score, f.bm25_rank, f.dense_rank
  FROM fused f
  JOIN public.chunks c USING (chunk_pk)
  ORDER BY f.score DESC
  LIMIT k;
 END
 $$;
 GRANT EXECUTE ON FUNCTION public.hybrid_search_chunks TO anon, authenticated;
 COMMIT;
--- a/scripts/02b-enrich-with-web-metadata.py
+++ b/scripts/02b-enrich-with-web-metadata.py
@ -90,10 +90,12 @@ def jaccard(a: set, b: set) -> float:
 def primary_id(s: str) -> str | None:
    n = normalize(s)
    # Catch (agency)-uap-d(\d+) once and rest of the dedicated patterns. Match
    # "cia-uap-d001", "doe-uap-d002", "odni-uap-d001", "dow-uap-d017", etc.
    m = re.match(r"^((?:cia|doe|dod|dow|dos|odni|nasa|fbi)-uap-[a-z]{1,4}\d+[a-z]?)", n)
    if m:
        return m.group(1)
    for p in (
        r"^(dow-uap-[a-z]{1,4}\d+)",
        r"^(dos-uap-d\d+)",
        r"^(nasa-uap-[a-z]{1,3}\d+[a-z]?)",
        r"^(fbi-photo-[a-z]\d+)",
    ):
        m = re.match(p, n)
@ -216,14 +218,33 @@ def main():
    ap = argparse.ArgumentParser()
    ap.add_argument("--dry-run", action="store_true")
    ap.add_argument("--rename-events", action="store_true", help="Rename EV-XXXX events to EV-YYYY-MM-DD")
    ap.add_argument("--metadata-json", action="append", default=None,
                    help="Path to a war.gov metadata JSON. Pass multiple times to merge releases. "
                         "Defaults to release-01 + release-02 if present.")
    args = ap.parse_args()
-    if not METADATA_JSON.exists():
+    if args.metadata_json:
-        sys.stderr.write(f"Metadata JSON not found: {METADATA_JSON}\n")
+        json_paths = [Path(p) for p in args.metadata_json]
-        sys.exit(1)
+    else:
-    data = json.loads(METADATA_JSON.read_text(encoding="utf-8"))
+        # Default: load every release-NN-basic JSON found, so 116 existing docs
-    records = data.get("documents", [])
+        # (release-01) and 6 new docs (release-02) all get enriched in one pass.
-    print(f"war.gov records: {len(records)}")
+        json_paths = sorted((UFO_ROOT / "processing" / "war-gov-metadata").glob("all-documents-release-*-basic.json"))
        if not json_paths:
            json_paths = [METADATA_JSON]
    records: list[dict] = []
    for p in json_paths:
        if not p.exists():
            sys.stderr.write(f"Metadata JSON not found: {p}\n"); sys.exit(1)
        d = json.loads(p.read_text(encoding="utf-8"))
        recs = d.get("documents", [])
        extracted_at = d.get("extracted_at")
        for r in recs:
            r.setdefault("_extracted_at", extracted_at)
            r.setdefault("_source_json", p.name)
        print(f"war.gov records from {p.name}: {len(recs)}")
        records.extend(recs)
    print(f"war.gov records total: {len(records)}")
    war_index = build_war_index(records)
    docs = sorted(DOCS_DIR.glob("*.md"))
@ -268,7 +289,7 @@ def main():
            "document_type_official": match.get("document_type"),
            "match_reason": reason,
            "availability": "pending-upstream" if match["record_id"] in PLACEHOLDER_RECORDS else "downloaded",
-            "extracted_from_war_gov_at": data.get("extracted_at"),
+            "extracted_from_war_gov_at": match.get("_extracted_at"),
        }
        new_fm = dict(fm)
@ -352,7 +373,7 @@ def main():
            fh.write(
                f"\n## {datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%SZ')} — ENRICH WAR.GOV (Phase 0.5)\n"
                f"- operator: archivist\n- script: scripts/02b-enrich-with-web-metadata.py\n"
-                f"- json_source: {METADATA_JSON.name}\n"
+                f"- json_source: {', '.join(p.name for p in json_paths)}\n"
                f"- enriched: {enriched}\n- unchanged: {unchanged}\n- unmatched: {len(unmatched)}\n"
                f"- event_renames: {rename_count}\n"
            )
--- a/scripts/maintain/57_load_relations_from_json.py
+++ b/scripts/maintain/57_load_relations_from_json.py
@ -264,9 +264,26 @@ def main() -> int:
                   SELECT source_class, source_id, relation_type,
                          target_class, target_id, evidence_ref,
                          confidence, extracted_by
-                   FROM _rel ON CONFLICT DO NOTHING"""
+                   FROM _rel
                   WHERE relation_type IN ('witnessed','occurred_at','involves_uap',
                                           'documented_in','authored','signed',
                                           'mentioned_by','employed_by','operated_by',
                                           'investigated','commanded','related_to',
                                           'similar_to','precedes','follows')
                   ON CONFLICT DO NOTHING"""
            )
-            print(f"Inserted (after ON CONFLICT): {cur.rowcount}")
+            print(f"Inserted (after ON CONFLICT + type filter): {cur.rowcount}")
            cur.execute(
                "SELECT relation_type, COUNT(*) FROM _rel WHERE relation_type NOT IN "
                "('witnessed','occurred_at','involves_uap','documented_in','authored','signed',"
                "'mentioned_by','employed_by','operated_by','investigated','commanded',"
                "'related_to','similar_to','precedes','follows') GROUP BY relation_type ORDER BY 2 DESC"
            )
            drops = cur.fetchall()
            if drops:
                print("Dropped (invalid relation_type):")
                for t, n in drops:
                    print(f"  {n:>5}  {t}")
            cur.execute(
                "SELECT relation_type, COUNT(*) FROM public.relations GROUP BY relation_type ORDER BY 2 DESC"
            )
--- a/scripts/maintain/58_backfill_embeddings.py
+++ b/scripts/maintain/58_backfill_embeddings.py
@ -30,7 +30,9 @@ EMBED_URL = os.getenv("EMBED_SERVICE_URL", "http://localhost:8000")
 def embed_batch(texts: list[str]) -> list[list[float]]:
-    resp = requests.post(f"{EMBED_URL}/embed", json={"texts": texts}, timeout=120)
+    # Cold-start of BGE-M3 takes ~8s per text on CPU; first call can run ~minutes
    # for a batch. Bump timeout to 10 minutes so the first batch doesn't kill the run.
    resp = requests.post(f"{EMBED_URL}/embed", json={"texts": texts}, timeout=600)
    resp.raise_for_status()
    return resp.json()["embeddings"]
--- a/scripts/maintain/60_meili_index.py
+++ b/scripts/maintain/60_meili_index.py
@ -0,0 +1,151 @@
 #!/usr/bin/env python3
 """
 60_meili_index.py — Push documents + chunks into Meilisearch for autocomplete.
 W1 deliverable. Meilisearch is the typo-tolerant prefix-aware search engine in
 the stack; it complements Postgres BM25 + pgvector (used by the chat). The
 goal here is fast `/search` autocomplete that shows matching docs and chunks
 as the user types — sub-30ms.
 Indexes created:
  - documents   id=doc_id, fields=[canonical_title, collection, doc_id]
  - chunks      id=chunk_pk, fields=[doc_id, chunk_id, page, content_en, content_pt]
 Idempotent: re-running upserts. Skip `--reset` to rebuild from scratch.
 Run from inside the disclosure-internal network OR with --meili-url override.
 The default reads MEILI_MASTER_KEY + MEILISEARCH_URL from env.
 Usage:
  python3 scripts/maintain/60_meili_index.py
  python3 scripts/maintain/60_meili_index.py --reset
  python3 scripts/maintain/60_meili_index.py --doc-id <id>
 """
 from __future__ import annotations
 import argparse
 import json
 import os
 import sys
 from typing import Any
 try:
    import psycopg
    import requests
 except ImportError as e:
    sys.exit(f"pip install psycopg[binary] requests  # missing: {e}")
 DATABASE_URL = os.getenv("DATABASE_URL") or os.getenv("SUPABASE_DB_URL")
 MEILI_URL = os.getenv("MEILISEARCH_URL", "http://meilisearch:7700")
 MEILI_KEY = os.getenv("MEILI_MASTER_KEY") or os.getenv("MEILISEARCH_API_KEY", "")
 BATCH = int(os.getenv("MEILI_BATCH", "1000"))
 def meili(method: str, path: str, body: Any = None) -> dict:
    headers = {"Authorization": f"Bearer {MEILI_KEY}", "Content-Type": "application/json"}
    r = requests.request(method, f"{MEILI_URL}{path}", headers=headers,
                         data=json.dumps(body) if body is not None else None,
                         timeout=120)
    r.raise_for_status()
    return r.json() if r.text else {}
 def ensure_index(uid: str, primary_key: str, searchable: list[str], filterable: list[str]):
    """Create the index if missing, then set settings."""
    try:
        meili("POST", "/indexes", {"uid": uid, "primaryKey": primary_key})
        print(f"  created index {uid}")
    except requests.HTTPError as e:
        # 409 = already exists, OK.
        if e.response.status_code not in (400, 409):
            raise
    meili("PATCH", f"/indexes/{uid}/settings", {
        "searchableAttributes": searchable,
        "filterableAttributes": filterable,
        "displayedAttributes": ["*"],
        "rankingRules": ["words", "typo", "proximity", "attribute", "sort", "exactness"],
        "typoTolerance": {"enabled": True, "minWordSizeForTypos": {"oneTypo": 4, "twoTypos": 8}},
    })
 def push(uid: str, docs: list[dict]):
    if not docs: return
    meili("POST", f"/indexes/{uid}/documents", docs)
 def main() -> int:
    ap = argparse.ArgumentParser()
    ap.add_argument("--reset", action="store_true", help="Delete and recreate indexes")
    ap.add_argument("--doc-id", help="Reindex only one doc")
    args = ap.parse_args()
    if not DATABASE_URL: sys.exit("DATABASE_URL not set")
    if not MEILI_KEY: sys.exit("MEILI_MASTER_KEY not set")
    if args.reset and not args.doc_id:
        print("Resetting indexes...")
        for uid in ("documents", "chunks"):
            try: meili("DELETE", f"/indexes/{uid}")
            except requests.HTTPError: pass
    ensure_index("documents", "doc_id",
                 searchable=["canonical_title", "collection", "doc_id"],
                 filterable=["collection", "classification"])
    ensure_index("chunks", "chunk_pk",
                 searchable=["content_pt", "content_en", "doc_id", "chunk_id"],
                 filterable=["doc_id", "type", "classification", "ufo_anomaly", "is_searchable"])
    with psycopg.connect(DATABASE_URL) as conn, conn.cursor() as cur:
        # documents
        where_doc = "WHERE doc_id = %s" if args.doc_id else ""
        params = (args.doc_id,) if args.doc_id else ()
        cur.execute(f"""
            SELECT doc_id, canonical_title, collection, classification
            FROM public.documents {where_doc}
        """, params)
        rows = cur.fetchall()
        docs = [{"doc_id": r[0], "canonical_title": r[1] or r[0],
                 "collection": r[2] or "", "classification": r[3] or ""} for r in rows]
        print(f"documents → meili: {len(docs)}")
        for i in range(0, len(docs), BATCH):
            push("documents", docs[i:i+BATCH])
        # chunks (only searchable ones — drops scaffolding noise)
        where_chunk = "WHERE c.is_searchable" + (" AND c.doc_id = %s" if args.doc_id else "")
        cur.execute(f"""
            SELECT c.chunk_pk, c.doc_id, c.chunk_id, c.page, c.type,
                   c.content_en, c.content_pt, c.classification, c.ufo_anomaly
            FROM public.chunks c
            {where_chunk}
        """, params)
        chunks: list[dict] = []
        total = 0
        for r in cur:
            chunks.append({
                "chunk_pk": r[0],
                "doc_id": r[1],
                "chunk_id": r[2],
                "page": r[3],
                "type": r[4],
                "content_en": (r[5] or "")[:2000],
                "content_pt": (r[6] or "")[:2000],
                "classification": r[7] or "",
                "ufo_anomaly": bool(r[8]),
                "is_searchable": True,
            })
            if len(chunks) >= BATCH:
                push("chunks", chunks)
                total += len(chunks)
                chunks = []
                print(f"  pushed {total} chunks...")
        if chunks:
            push("chunks", chunks)
            total += len(chunks)
        print(f"chunks → meili: {total}")
    print("\n✓ done. Indexer enqueued; meili processes asynchronously.")
    print(f"  Verify: curl -H 'Authorization: Bearer ...' {MEILI_URL}/indexes/chunks/stats")
    return 0
 if __name__ == "__main__":
    sys.exit(main())
--- a/scripts/synthesize/40_reading_version.py
+++ b/scripts/synthesize/40_reading_version.py
@ -97,7 +97,7 @@ def call_llm(prompt: str) -> str:
                ["claude", "-p", "--model", "sonnet", "--output-format", "text",
                 "--disallowed-tools", DISALLOWED],
                input=prompt.encode("utf-8"), stdout=out, stderr=subprocess.PIPE, env=env,
-                timeout=600,
+                timeout=1200,
            )
        if r.returncode != 0:
            sys.exit(f"claude failed rc={r.returncode}: {r.stderr.decode('utf-8','replace')[:500]}")
@ -107,6 +107,62 @@ def call_llm(prompt: str) -> str:
        except OSError: pass
 # Above this size, the reading version won't fit one Sonnet call (32k-token
 # output ceiling + timeout), so we segment by page blocks and concatenate.
 SEGMENT_THRESHOLD = 90_000
 SEGMENT_CHARS = 45_000
 PROMPT_SEGMENT = """You are a meticulous archivist-typographer for The Disclosure Bureau. This is
 PART {n} OF {m} of a large scanned UAP/UFO document — you receive the raw
 machine-extracted text of THIS part only (chunk by chunk). The scan is messy:
 duplicate transcriptions, OCR noise, repeated letterheads, classification
 banners, page numbers, routing stamps.
 Produce a clean, faithful, well-structured reading version of THIS PART in
 Markdown.
 RULES:
 1. FAITHFUL — never invent. Keep [redacted]/[ilegível] markers.
 2. DEDUPLICATE within this part — merge repeated content, keep unique details.
 3. DROP page furniture (letterheads, banners, page numbers, routing stamps, OCR
   garbage).
 4. STRUCTURE with clear Markdown headings (##/###) and clean dialogue
   (**SPEAKER:**) for transcripts. Do NOT write a document-level H1 title (the
   document already has one); start at "## Part {n}" then sub-sections.
 5. BILINGUAL — for THIS part output English first under "### English", then
   Brazilian Portuguese under "### Português". Natural pt-br with correct accents.
 6. PRESERVE every investigative detail (sightings, coords, times, witnesses,
   object descriptions, quotes).
 Return ONLY the Markdown for this part (no code fence, no preamble). Start with
 "## Part {n}".
 DOCUMENT (doc_id: {doc_id}) — PART {n} OF {m}, raw chunks follow:
 {doc_text}
 """
 def segment_text(text: str) -> list[str]:
    """Split doc text into blocks at [chunk ...] markers near SEGMENT_CHARS."""
    import re as _re
    if len(text) <= SEGMENT_CHARS:
        return [text]
    starts = [m.start() for m in _re.finditer(r"^\[chunk c\d+", text, _re.MULTILINE)]
    if not starts:
        return [text]
    segs: list[str] = []
    s = 0
    while s < len(text):
        cap = s + SEGMENT_CHARS
        if cap >= len(text):
            segs.append(text[s:]); break
        cands = [p for p in starts if s < p < cap]
        e = cands[-1] if cands else cap
        segs.append(text[s:e]); s = e
    return segs
 def main() -> int:
    if len(sys.argv) < 2:
        sys.exit("usage: 40_reading_version.py <doc-id>")
@ -118,9 +174,21 @@ def main() -> int:
    print(f"      {len(doc_text)} chars (~{len(doc_text)//4} tokens)")
    print("[2/3] generating reading version (Sonnet) ...")
-    md = call_llm(PROMPT.format(doc_id=doc_id, doc_text=doc_text)).strip()
+    if len(doc_text) > SEGMENT_THRESHOLD:
-    if md.startswith("```"):
+        segs = segment_text(doc_text)
-        md = "\n".join(l for l in md.splitlines() if not l.startswith("```")).strip()
+        print(f"      large doc → {len(segs)} segments")
        parts: list[str] = []
        for i, seg in enumerate(segs, 1):
            print(f"      segment {i}/{len(segs)} ({len(seg)} chars) ...")
            p = call_llm(PROMPT_SEGMENT.format(n=i, m=len(segs), doc_id=doc_id, doc_text=seg)).strip()
            if p.startswith("```"):
                p = "\n".join(l for l in p.splitlines() if not l.startswith("```")).strip()
            parts.append(p)
        md = "\n\n---\n\n".join(parts)
    else:
        md = call_llm(PROMPT.format(doc_id=doc_id, doc_text=doc_text)).strip()
        if md.startswith("```"):
            md = "\n".join(l for l in md.splitlines() if not l.startswith("```")).strip()
    front = (
        f"---\nschema_version: \"0.1.0\"\ntype: reading\ndoc_id: {doc_id}\n"
--- a/scripts/synthesize/run_reading_parallel.sh
+++ b/scripts/synthesize/run_reading_parallel.sh
@ -0,0 +1,69 @@
 #!/usr/bin/env bash
 # Generate the clean LLM reading version for every document, in parallel.
 #
 # - One doc per `claude -p` (Sonnet) via 40_reading_version.py
 # - Skips docs that already have reading.md (idempotent — safe to re-run)
 # - mkdir-based per-doc lock prevents two workers racing the same doc
 # - WORKERS parallel workers (default 2)
 #
 # Run:
 #   ./run_reading_parallel.sh                # all docs, 2 workers
 #   WORKERS=3 ./run_reading_parallel.sh      # 3 workers
 #   ./run_reading_parallel.sh DOC1 DOC2      # specific docs only
 set -uo pipefail
 UFO="/Users/guto/ufo"
 RAW="$UFO/raw"
 GEN="$UFO/scripts/synthesize/40_reading_version.py"
 WORKERS="${WORKERS:-2}"
 if [ "$#" -gt 0 ]; then
  DOCS=("$@")
 else
  DOCS=()
  for d in "$RAW"/*--subagent; do
    [ -f "$d/_index.json" ] || continue
    DOCS+=("$(basename "$d" | sed 's/--subagent$//')")
  done
 fi
 echo "=== reading-version generator ==="
 echo "  docs queued: ${#DOCS[@]}"
 echo "  workers:     $WORKERS"
 echo ""
 process_one() {
  local doc_id="$1"
  local sub="$RAW/$doc_id--subagent"
  local out="$sub/reading.md"
  local log="$sub/_reading.log"
  local lock="$sub/.reading.lock"
  if [ -f "$out" ]; then
    echo "[SKIP] $doc_id (already has reading.md)"
    return 0
  fi
  if ! mkdir "$lock" 2>/dev/null; then
    echo "[LOCK] $doc_id (another worker)"
    return 0
  fi
  trap "rmdir '$lock' 2>/dev/null || true" EXIT
  local t0=$(date +%s)
  echo "[BEGIN] $doc_id"
  if python3 "$GEN" "$doc_id" > "$log" 2>&1; then
    echo "[OK]    $doc_id ($(($(date +%s) - t0))s)"
  else
    echo "[FAIL]  $doc_id ($(($(date +%s) - t0))s) — see $log"
  fi
  rmdir "$lock" 2>/dev/null || true
  trap - EXIT
 }
 export -f process_one
 export RAW GEN
 printf '%s\n' "${DOCS[@]}" | xargs -n 1 -P "$WORKERS" -I {} bash -c 'process_one "$@"' _ {}
 echo ""
 echo "=== Done. reading.md count: ==="
 ls "$RAW"/*--subagent/reading.md 2>/dev/null | wc -l
--- a/web/app/api/admin/throw/route.ts
+++ b/web/app/api/admin/throw/route.ts
@ -0,0 +1,16 @@
 /**
 * /api/debug/throw — admin-only error injector. Throws on demand so we can
 * verify Glitchtip is receiving events. Gated by /api/admin/* middleware (404
 * for non-admins).
 *
 * Move the path under /api/admin/* so the W0-F1 middleware gate applies.
 */
 import { withRequest } from "@/lib/logger";
 export const runtime = "nodejs";
 export async function GET(request: Request) {
  const log = withRequest(request);
  log.warn({ event: "debug_throw" }, "intentional error for Glitchtip smoke test");
  throw new Error("debug_throw_smoke_test: glitchtip wiring verified at " + new Date().toISOString());
 }
--- a/web/app/api/search/autocomplete/route.ts
+++ b/web/app/api/search/autocomplete/route.ts
@ -0,0 +1,95 @@
 /**
 * /api/search/autocomplete — typo-tolerant prefix search via Meilisearch.
 *
 * Hits two indexes in parallel and returns a small merged result:
 *   - documents   (title-level matches, used to jump to a doc)
 *   - chunks      (passage-level matches, used for in-doc navigation)
 *
 * Target latency: sub-30ms inside the docker network. Falls back to empty
 * results if Meilisearch is unreachable so the chat / hybrid_search aren't
 * blocked. Auth: none — same as /api/search/hybrid; corpus is public.
 */
 import { NextResponse } from "next/server";
 import { withRequest } from "@/lib/logger";
 export const runtime = "nodejs";
 export const dynamic = "force-dynamic";
 const MEILI_URL = process.env.MEILISEARCH_URL || "http://meilisearch:7700";
 const MEILI_KEY = process.env.MEILISEARCH_API_KEY || process.env.MEILI_MASTER_KEY || "";
 interface DocHit {
  doc_id: string;
  canonical_title: string;
  collection?: string;
 }
 interface ChunkHit {
  chunk_pk: number;
  doc_id: string;
  chunk_id: string;
  page: number;
  type: string;
  content_pt?: string;
  content_en?: string;
  ufo_anomaly?: boolean;
 }
 async function meiliSearch(index: string, q: string, limit: number): Promise<unknown[]> {
  const r = await fetch(`${MEILI_URL}/indexes/${index}/search`, {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${MEILI_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ q, limit, attributesToHighlight: ["canonical_title", "content_pt", "content_en"] }),
    signal: AbortSignal.timeout(2000),
  });
  if (!r.ok) throw new Error(`meili ${r.status}`);
  const data = await r.json();
  return data.hits ?? [];
 }
 export async function GET(request: Request) {
  const log = withRequest(request);
  const url = new URL(request.url);
  const q = (url.searchParams.get("q") || "").trim();
  const limit = Math.min(Number(url.searchParams.get("limit") || 8), 20);
  if (q.length < 2) {
    return NextResponse.json({ q, documents: [], chunks: [] });
  }
  if (!MEILI_KEY) {
    log.warn({ event: "autocomplete_unconfigured" }, "MEILI key not set");
    return NextResponse.json({ q, documents: [], chunks: [], reason: "meili_not_configured" });
  }
  const t0 = Date.now();
  const [docs, chunks] = await Promise.all([
    meiliSearch("documents", q, Math.min(limit, 5)).catch(() => []),
    meiliSearch("chunks", q, limit).catch(() => []),
  ]) as [DocHit[], ChunkHit[]];
  const dt = Date.now() - t0;
  log.info({ event: "autocomplete", q, docs: docs.length, chunks: chunks.length, dt_ms: dt }, "autocomplete done");
  return NextResponse.json({
    q,
    duration_ms: dt,
    documents: docs.map((d) => ({
      doc_id: d.doc_id,
      title: d.canonical_title,
      collection: d.collection,
      href: `/d/${d.doc_id}`,
    })),
    chunks: chunks.map((c) => ({
      chunk_id: c.chunk_id,
      doc_id: c.doc_id,
      page: c.page,
      type: c.type,
      excerpt: (c.content_pt || c.content_en || "").slice(0, 180),
      ufo_anomaly: !!c.ufo_anomaly,
      href: `/d/${c.doc_id}/p${String(c.page).padStart(3, "0")}#${c.chunk_id}`,
    })),
  });
 }
--- a/web/app/api/sessions/[id]/messages/route.ts
+++ b/web/app/api/sessions/[id]/messages/route.ts
@ -18,6 +18,7 @@ import { createClient, isSupabaseConfigured } from "@/lib/supabase/server";
 import { readDocument, readPage } from "@/lib/wiki";
 import { streamChat } from "@/lib/chat";
 import { getLocale } from "@/components/locale-toggle";
 import { withRequest } from "@/lib/logger";
 async function gatherContext(docId: string | null, pageId: string | null): Promise<string> {
  const parts: string[] = [];
@ -129,8 +130,9 @@ Quotes verbatim do documento mantêm idioma original (inglês), narração ao re
 export async function POST(request: Request, ctx: { params: Promise<{ id: string }> }) {
  const { id: sessionId } = await ctx.params;
  const t0 = Date.now();
  const baseLog = withRequest(request).child({ session_id: sessionId.slice(0, 8) });
  const log = (stage: string, extra: Record<string, unknown> = {}) =>
-    console.log(`[chat ${sessionId.slice(0, 8)}] ${stage}`, { dt: Date.now() - t0, ...extra });
+    baseLog.info({ stage, dt_ms: Date.now() - t0, ...extra }, stage);
  log("POST received");
  if (!isSupabaseConfigured()) {
--- a/web/app/d/[docId]/page.tsx
+++ b/web/app/d/[docId]/page.tsx
@ -13,6 +13,7 @@ import { getLocale } from "@/components/locale-toggle";
 import { AuthBar } from "@/components/auth-bar";
 import { ChatBubble } from "@/components/chat-bubble";
 import { DocReadingView } from "@/components/doc-reading-view";
 import { AnomalyHighlights, type AnomalyFlag } from "@/components/anomaly-highlights";
 import { MarkdownBody } from "@/components/markdown-body";
 export const dynamic = "force-dynamic";
@ -70,17 +71,31 @@ export default async function DocPage({
    .sort((a, b) => b[1] - a[1])
    .slice(0, 6);
-  // Count UFO/cryptid anomalies across chunks
+  // Count UFO/cryptid anomalies across chunks + collect flags for the highlight panel
  let ufoCount = 0;
  let cryptidCount = 0;
  let imageCount = 0;
-  for (const [, chunks] of byPage) {
+  const ufoFlags: AnomalyFlag[] = [];
  const cryptidFlags: AnomalyFlag[] = [];
  for (const [page, chunks] of byPage) {
    for (const c of chunks) {
-      if (c.fm.ufo_anomaly_detected) ufoCount++;
+      if (c.fm.ufo_anomaly_detected)
-      if (c.fm.cryptid_anomaly_detected) cryptidCount++;
+        ufoFlags.push({
          chunk_id: c.fm.chunk_id,
          page,
          type: c.fm.ufo_anomaly_type ?? null,
          rationale: c.fm.ufo_anomaly_rationale ?? null,
        });
      if (c.fm.cryptid_anomaly_detected)
        cryptidFlags.push({
          chunk_id: c.fm.chunk_id,
          page,
          type: c.fm.cryptid_anomaly_type ?? null,
          rationale: c.fm.cryptid_anomaly_rationale ?? null,
        });
      if (c.fm.type === "image") imageCount++;
    }
  }
  const ufoCount = ufoFlags.length;
  const cryptidCount = cryptidFlags.length;
  const classification = (doc?.fm.highest_classification as string) ?? "—";
  const collection = (doc?.fm.collection as string) ?? "—";
@ -136,6 +151,8 @@ export default async function DocPage({
        )}
      </header>
      <AnomalyHighlights docId={docId} ufo={ufoFlags} cryptid={cryptidFlags} />
      <DocReadingView docId={docId} reading={reading} chunksByPage={ordered} />
      <ChatBubble context={{ doc_id: docId }} />
--- a/web/app/e/[cls]/[id]/page.tsx
+++ b/web/app/e/[cls]/[id]/page.tsx
@ -11,6 +11,7 @@ import { ChatBubble } from "@/components/chat-bubble";
 import { AuthBar } from "@/components/auth-bar";
 import { EntityGraphMini } from "@/components/entity-graph-mini";
 import { EntityRelations } from "@/components/entity-relations";
 import { EntityAttributes } from "@/components/entity-attributes";
 import {
  getEntityCore,
  getEntityMentionsByDoc,
@ -111,6 +112,21 @@ export default async function EntityPage({
  const classColor = CLASS_COLOR[folder as EntityClass];
  const classBg = CLASS_BG[folder as EntityClass];
  // The generated entity bodies hold only "# Title" + empty "## Description"
  // headings — strip headings and see if any real prose remains.
  const bodyProse = (wiki?.body ?? "").replace(/^#.*$/gm, "").trim();
  const hasNarrativeProse = bodyProse.length > 20;
  // Does the frontmatter carry any displayable description/attribute?
  const fm = (wiki?.fm ?? {}) as Record<string, unknown>;
  const arr = (v: unknown) => Array.isArray(v) && v.length > 0;
  const fmHasContent = Boolean(
    fm.narrative_summary_pt_br || fm.narrative_summary_en || fm.maneuver_notes ||
      fm.shape || fm.color || fm.medium || fm.event_class || fm.person_class ||
      fm.org_class || fm.geo_class || fm.date_start ||
      arr(fm.countries) || arr(fm.roles) || arr(fm.affiliations) ||
      arr(fm.primary_location_names) || arr(fm.regions_or_states),
  );
  return (
    <main className="min-h-screen p-6 md:p-10 max-w-6xl mx-auto">
      <div className="flex items-start justify-between gap-4 mb-6">
@ -230,6 +246,9 @@ export default async function EntityPage({
      <div className="grid grid-cols-1 lg:grid-cols-[1fr_320px] gap-8">
        {/* MAIN — narrative + chunks live */}
        <article>
          {/* Structured description + attributes from frontmatter */}
          {wiki?.fm && <EntityAttributes fm={wiki.fm as Record<string, unknown>} />}
          {/* Live chunk previews — most impactful section */}
          {sampleChunks.length > 0 && (
            <section className="mb-10">
@ -283,17 +302,18 @@ export default async function EntityPage({
            </section>
          )}
-          {/* Narrative body (Haiku stub OK quando rico) */}
+          {/* Narrative body — only when it carries real prose, not just the
-          {wiki?.body && wiki.body.trim().length > 30 && (
+              empty "## Description" headings the generator leaves behind. */}
          {hasNarrativeProse && (
            <section className="pt-6 border-t border-[rgba(0,255,156,0.12)]">
              <h2 className="font-mono text-sm text-[#7fdbff] uppercase tracking-widest mb-3 border-l-2 border-[#7fdbff] pl-3">
                Narrativa
              </h2>
-              <MarkdownBody>{wiki.body}</MarkdownBody>
+              <MarkdownBody>{wiki!.body}</MarkdownBody>
            </section>
          )}
-          {sampleChunks.length === 0 && (!wiki?.body || wiki.body.trim().length === 0) && (
+          {sampleChunks.length === 0 && !hasNarrativeProse && !fmHasContent && (
            <div className="text-[#5a6678] italic text-sm p-6 border border-[rgba(255,165,0,0.30)] bg-[rgba(255,165,0,0.05)] rounded">
              Entidade ainda sem chunks indexados na DB. Aguarde o indexer terminar.
            </div>
--- a/web/components/anomaly-highlights.tsx
+++ b/web/components/anomaly-highlights.tsx
@ -0,0 +1,135 @@
 /**
 * AnomalyHighlights — prominent UAP / cryptid anomaly panel for the document
 * page. The clean reading version is the default body, but the investigative
 * "destaque" of every flagged passage must stay visible regardless of which
 * view (reading or scan) is active. Identical type+rationale flags are grouped
 * and each group links to the per-page scan where the anomaly was detected.
 */
 import Link from "next/link";
 export interface AnomalyFlag {
  chunk_id: string;
  page: number;
  type: string | null;
  rationale: string | null;
 }
 function clean(v: string | null): string | null {
  const s = typeof v === "string" ? v.trim() : "";
  return s && s.toLowerCase() !== "null" ? s : null;
 }
 interface Group {
  type: string | null;
  rationale: string | null; // shown only when the group has a single flag
  count: number;
  pages: number[];
 }
 // Group by anomaly type so the panel stays a scannable "destaque" overview.
 // Per-passage rationale is kept only when a type has exactly one flag; the full
 // per-chunk rationale remains available in the "trechos · scan original" view.
 function groupFlags(flags: AnomalyFlag[]): Group[] {
  const m = new Map<string, Group>();
  for (const f of flags) {
    const type = clean(f.type);
    const rationale = clean(f.rationale);
    const key = type ?? "anomalia";
    const g = m.get(key) ?? { type, rationale, count: 0, pages: [] };
    g.count += 1;
    g.rationale = g.count === 1 ? rationale : null;
    if (!g.pages.includes(f.page)) g.pages.push(f.page);
    m.set(key, g);
  }
  return Array.from(m.values())
    .map((g) => ({ ...g, pages: g.pages.sort((a, b) => a - b) }))
    .sort((a, b) => b.count - a.count || a.pages[0] - b.pages[0]);
 }
 function pad(p: number): string {
  return String(p).padStart(3, "0");
 }
 function PageChips({ docId, pages }: { docId: string; pages: number[] }) {
  const shown = pages.slice(0, 14);
  const extra = pages.length - shown.length;
  return (
    <span className="inline-flex flex-wrap gap-1 align-middle">
      {shown.map((p) => (
        <Link
          key={p}
          href={`/d/${docId}/p${pad(p)}`}
          className="font-mono text-[10px] px-1.5 py-0.5 border border-[rgba(127,219,255,0.30)] text-[#7fdbff] rounded hover:border-[#00ff9c] hover:text-[#00ff9c]"
        >
          p{p}
        </Link>
      ))}
      {extra > 0 && <span className="font-mono text-[10px] text-[#5a6678]">+{extra}</span>}
    </span>
  );
 }
 export function AnomalyHighlights({
  docId,
  ufo,
  cryptid,
 }: {
  docId: string;
  ufo: AnomalyFlag[];
  cryptid: AnomalyFlag[];
 }) {
  if (ufo.length === 0 && cryptid.length === 0) return null;
  const ufoGroups = groupFlags(ufo);
  const cryptidGroups = groupFlags(cryptid);
  return (
    <section className="mb-6 border border-[rgba(0,255,156,0.40)] bg-[rgba(0,255,156,0.05)] rounded p-4">
      {ufo.length > 0 && (
        <>
          <h2 className="font-mono text-sm text-[#00ff9c] mb-3 flex items-center gap-2">
            🛸 Anomalias UAP destacadas
            <span className="text-[#5a6678]">
              ({ufo.length} {ufo.length === 1 ? "trecho" : "trechos"} · {ufoGroups.length}{" "}
              {ufoGroups.length === 1 ? "tipo" : "tipos"})
            </span>
          </h2>
          <ul className="space-y-2.5">
            {ufoGroups.map((g, i) => (
              <li key={i} className="text-sm text-[#c8d4e6] leading-relaxed">
                <span className="font-mono text-[#00ff9c]">🛸 {g.type ?? "anomalia"}</span>
                {g.count > 1 && (
                  <span className="font-mono text-[10px] text-[#5a6678]"> ×{g.count}</span>
                )}
                {g.rationale && <span className="text-[#c8d4e6]"> — {g.rationale}</span>}{" "}
                <PageChips docId={docId} pages={g.pages} />
              </li>
            ))}
          </ul>
        </>
      )}
      {cryptid.length > 0 && (
        <div className={ufo.length > 0 ? "mt-4 pt-4 border-t border-[rgba(155,93,229,0.25)]" : ""}>
          <h2 className="font-mono text-sm text-[#9b5de5] mb-3 flex items-center gap-2">
            👁 Anomalias cryptid destacadas
            <span className="text-[#5a6678]">
              ({cryptid.length} {cryptid.length === 1 ? "trecho" : "trechos"})
            </span>
          </h2>
          <ul className="space-y-2.5">
            {cryptidGroups.map((g, i) => (
              <li key={i} className="text-sm text-[#c8d4e6] leading-relaxed">
                <span className="font-mono text-[#9b5de5]">👁 {g.type ?? "anomalia"}</span>
                {g.count > 1 && (
                  <span className="font-mono text-[10px] text-[#5a6678]"> ×{g.count}</span>
                )}
                {g.rationale && <span className="text-[#c8d4e6]"> — {g.rationale}</span>}{" "}
                <PageChips docId={docId} pages={g.pages} />
              </li>
            ))}
          </ul>
        </div>
      )}
    </section>
  );
 }
--- a/web/components/entity-attributes.tsx
+++ b/web/components/entity-attributes.tsx
@ -0,0 +1,164 @@
 /**
 * EntityAttributes — renders an entity's descriptive content and structured
 * attributes straight from its wiki frontmatter. The generated entity files
 * carry their real content in YAML fields (narrative_summary_*, maneuver_notes,
 * shape, color, roles, countries, …) while the markdown body holds only empty
 * "## Description" headings — so the page must surface the frontmatter.
 */
 type FM = Record<string, unknown>;
 const ATTR_LABELS: Record<string, string> = {
  event_class: "Tipo de evento",
  date_start: "Início",
  date_end: "Fim",
  date_confidence: "Confiança da data",
  primary_location_names: "Locais",
  primary_location_geo_classes: "Classe do local",
  geo_class: "Classe geográfica",
  countries: "Países",
  regions_or_states: "Regiões / estados",
  org_class: "Tipo de organização",
  person_class: "Tipo de pessoa",
  affiliations: "Afiliações",
  roles: "Funções / papéis",
  shape: "Forma",
  color: "Cor",
  medium: "Meio",
  size_estimate_m: "Tamanho estimado (m)",
  altitude_ft: "Altitude (ft)",
  speed_kts: "Velocidade (kt)",
 };
 // Order in which attributes are shown (only those present render).
 const ATTR_ORDER = [
  "event_class",
  "person_class",
  "org_class",
  "shape",
  "color",
  "medium",
  "size_estimate_m",
  "altitude_ft",
  "speed_kts",
  "date_start",
  "date_end",
  "date_confidence",
  "geo_class",
  "countries",
  "regions_or_states",
  "primary_location_names",
  "primary_location_geo_classes",
  "affiliations",
  "roles",
 ];
 function clean(v: unknown): string | null {
  const s = typeof v === "string" ? v.trim() : "";
  return s && s.toLowerCase() !== "null" ? s : null;
 }
 // Placeholder values that carry no real attribute information — hidden from the
 // ATRIBUTOS grid (but never from the free-text description).
 const EMPTY_TOKENS = new Set([
  "null",
  "none",
  "n/a",
  "na",
  "unknown",
  "unidentified",
  "undetermined",
  "unspecified",
  "not specified",
  "not stated",
  "not applicable",
 ]);
 function isEmptyToken(s: string): boolean {
  return EMPTY_TOKENS.has(s.trim().toLowerCase());
 }
 function fmtValue(v: unknown): string | null {
  if (v == null) return null;
  if (Array.isArray(v)) {
    const items = v
      .map((x) => (typeof x === "string" ? x.trim() : String(x)))
      .filter((x) => x && !x.startsWith("[[") && !isEmptyToken(x));
    return items.length ? items.join(", ") : null;
  }
  if (typeof v === "number") return String(v);
  const s = clean(v);
  return s && !isEmptyToken(s) ? s : null;
 }
 export function EntityAttributes({ fm }: { fm: FM }) {
  const ptText = clean(fm.narrative_summary_pt_br) ?? clean(fm.description_pt_br);
  const enText = clean(fm.narrative_summary_en) ?? clean(fm.description_en);
  const notes = clean(fm.maneuver_notes); // source-language only (uap_object)
  const attrs = ATTR_ORDER.map((k) => [k, fmtValue(fm[k])] as const).filter(
    ([, v]) => v !== null,
  );
  const hasDescription = Boolean(ptText || enText || notes);
  if (!hasDescription && attrs.length === 0) return null;
  return (
    <section className="mb-10">
      {hasDescription && (
        <>
          {ptText && (
            <div className="mb-4">
              <h2 className="font-mono text-sm text-[#7fdbff] uppercase tracking-widest mb-2 border-l-2 border-[#7fdbff] pl-3">
                Descrição (PT-BR)
              </h2>
              <p className="text-[15px] leading-relaxed text-[#c8d4e6]">{ptText}</p>
            </div>
          )}
          {enText && (
            <div className="mb-4">
              <h2 className="font-mono text-sm text-[#7fdbff] uppercase tracking-widest mb-2 border-l-2 border-[#7fdbff] pl-3">
                Description (EN)
              </h2>
              <p className="text-[15px] leading-relaxed text-[#c8d4e6]">{enText}</p>
            </div>
          )}
          {notes && !ptText && !enText && (
            <div className="mb-4">
              <h2 className="font-mono text-sm text-[#7fdbff] uppercase tracking-widest mb-2 border-l-2 border-[#7fdbff] pl-3">
                Descrição · Description
              </h2>
              <p className="text-[15px] leading-relaxed text-[#c8d4e6]">{notes}</p>
            </div>
          )}
          {notes && (ptText || enText) && (
            <div className="mb-4">
              <h3 className="font-mono text-[11px] text-[#8896aa] uppercase tracking-widest mb-1">
                Notas de manobra / aparência
              </h3>
              <p className="text-sm leading-relaxed text-[#8896aa]">{notes}</p>
            </div>
          )}
        </>
      )}
      {attrs.length > 0 && (
        <div className="mt-2">
          <h3 className="font-mono text-[11px] text-[#8896aa] uppercase tracking-widest mb-2">
            Atributos
          </h3>
          <dl className="grid grid-cols-1 sm:grid-cols-2 gap-x-6 gap-y-2">
            {attrs.map(([k, v]) => (
              <div key={k} className="flex items-baseline gap-2 border-b border-[rgba(127,219,255,0.10)] pb-1.5">
                <dt className="font-mono text-[11px] text-[#5a6678] uppercase tracking-wide shrink-0 min-w-[42%]">
                  {ATTR_LABELS[k] ?? k}
                </dt>
                <dd className="text-sm text-[#c8d4e6]">{v}</dd>
              </div>
            ))}
          </dl>
        </div>
      )}
    </section>
  );
 }
--- a/web/components/search-autocomplete.tsx
+++ b/web/components/search-autocomplete.tsx
@ -0,0 +1,137 @@
 "use client";
 /**
 * SearchAutocomplete — type-as-you-go dropdown on the /search input.
 *
 * Hits /api/search/autocomplete (Meilisearch) with debounced fetch and renders
 * a two-section dropdown: matching documents (jump targets) and matching
 * chunks (in-doc passages with excerpt). Sub-30ms target. Keyboard navigation
 * via Up/Down + Enter. Esc closes.
 */
 import { useEffect, useRef, useState } from "react";
 import Link from "next/link";
 interface DocSuggestion {
  doc_id: string;
  title: string;
  collection?: string;
  href: string;
 }
 interface ChunkSuggestion {
  chunk_id: string;
  doc_id: string;
  page: number;
  type: string;
  excerpt: string;
  ufo_anomaly: boolean;
  href: string;
 }
 interface ApiResponse {
  q: string;
  duration_ms?: number;
  documents: DocSuggestion[];
  chunks: ChunkSuggestion[];
 }
 export function SearchAutocomplete({ query, onPick }: { query: string; onPick?: () => void }) {
  const [data, setData] = useState<ApiResponse | null>(null);
  const [loading, setLoading] = useState(false);
  const [open, setOpen] = useState(false);
  const timer = useRef<ReturnType<typeof setTimeout> | null>(null);
  const abort = useRef<AbortController | null>(null);
  useEffect(() => {
    const q = query.trim();
    if (q.length < 2) {
      setData(null); setOpen(false); return;
    }
    if (timer.current) clearTimeout(timer.current);
    timer.current = setTimeout(async () => {
      abort.current?.abort();
      abort.current = new AbortController();
      setLoading(true);
      try {
        const r = await fetch(`/api/search/autocomplete?q=${encodeURIComponent(q)}`, {
          signal: abort.current.signal,
        });
        if (!r.ok) throw new Error(`HTTP ${r.status}`);
        const j = (await r.json()) as ApiResponse;
        setData(j);
        setOpen(j.documents.length + j.chunks.length > 0);
      } catch (e) {
        if ((e as Error).name === "AbortError") return;
        setData(null); setOpen(false);
      } finally {
        setLoading(false);
      }
    }, 150);
    return () => { if (timer.current) clearTimeout(timer.current); };
  }, [query]);
  if (!open || !data) return null;
  return (
    <div className="absolute z-30 left-0 right-0 mt-1 max-h-[60vh] overflow-y-auto bg-[#0a121e] border border-[#00ff9c] rounded shadow-lg">
      <div className="flex items-center justify-between px-3 py-1.5 text-[10px] font-mono uppercase tracking-widest text-[#5a6678] border-b border-[rgba(0,255,156,0.20)]">
        <span>
          ⚡ autocomplete · {data.documents.length} docs · {data.chunks.length} trechos
        </span>
        <span>{loading ? "…" : `${data.duration_ms ?? "?"}ms`}</span>
      </div>
      {data.documents.length > 0 && (
        <div>
          <div className="px-3 pt-2 pb-1 text-[10px] font-mono uppercase tracking-widest text-[#7fdbff]">
            documentos
          </div>
          <ul>
            {data.documents.map((d) => (
              <li key={d.doc_id}>
                <Link
                  href={d.href}
                  onClick={onPick}
                  className="block px-3 py-2 hover:bg-[rgba(0,255,156,0.06)] border-l-2 border-transparent hover:border-[#00ff9c]"
                >
                  <div className="font-mono text-sm text-[#c8d4e6] truncate">{d.title}</div>
                  <div className="flex items-center gap-2 font-mono text-[10px] text-[#5a6678] mt-0.5">
                    <span>{d.doc_id}</span>
                    {d.collection && <span>· {d.collection}</span>}
                  </div>
                </Link>
              </li>
            ))}
          </ul>
        </div>
      )}
      {data.chunks.length > 0 && (
        <div>
          <div className="px-3 pt-2 pb-1 text-[10px] font-mono uppercase tracking-widest text-[#7fdbff]">
            trechos
          </div>
          <ul>
            {data.chunks.map((c) => (
              <li key={`${c.doc_id}-${c.chunk_id}`}>
                <Link
                  href={c.href}
                  onClick={onPick}
                  className="block px-3 py-2 hover:bg-[rgba(0,255,156,0.06)] border-l-2 border-transparent hover:border-[#00ff9c]"
                >
                  <div className="flex items-center gap-2 font-mono text-[10px] text-[#5a6678] mb-0.5">
                    <span className="text-[#00ff9c]">{c.chunk_id}</span>
                    <span>p{c.page}</span>
                    <span>{c.type}</span>
                    {c.ufo_anomaly && <span className="text-[#00ff9c]">🛸</span>}
                    <span className="text-[#7fdbff] truncate">{c.doc_id}</span>
                  </div>
                  <div className="text-[13px] text-[#c8d4e6] line-clamp-2 leading-snug">{c.excerpt}</div>
                </Link>
              </li>
            ))}
          </ul>
        </div>
      )}
    </div>
  );
 }
--- a/web/components/search-panel.tsx
+++ b/web/components/search-panel.tsx
@ -9,6 +9,7 @@ import Image from "next/image";
 import Link from "next/link";
 import { useEffect, useState } from "react";
 import { useRouter, useSearchParams } from "next/navigation";
 import { SearchAutocomplete } from "./search-autocomplete";
 interface Hit {
  chunk_id: string;
@ -94,7 +95,7 @@ export function SearchPanel({
        onSubmit={submit}
        className="space-y-3 mb-8 p-4 border border-[rgba(0,255,156,0.15)] bg-[#0a121e] rounded"
      >
-        <div>
+        <div className="relative">
          <label className="font-mono text-[10px] uppercase tracking-widest text-[#5a6678] block mb-1">
            query
          </label>
@ -105,6 +106,7 @@ export function SearchPanel({
            className="w-full bg-transparent border border-[rgba(0,255,156,0.20)] focus:border-[#00ff9c] rounded px-3 py-2 font-mono text-sm text-[#c8d4e6] outline-none"
            autoFocus
          />
          <SearchAutocomplete query={q} onPick={() => setQ("")} />
        </div>
        <div className="flex flex-wrap items-end gap-3">
          <div>
--- a/web/instrumentation.ts
+++ b/web/instrumentation.ts
@ -0,0 +1,33 @@
 /**
 * Next.js instrumentation hook — loads Sentry (Glitchtip) init on server/edge.
 *
 * https://nextjs.org/docs/app/building-your-application/optimizing/instrumentation
 */
 export async function register() {
  if (process.env.NEXT_RUNTIME === "nodejs") {
    await import("./sentry.server.config");
  }
  if (process.env.NEXT_RUNTIME === "edge") {
    // Edge runtime gets a slimmer init via the same DSN; the SDK auto-detects.
    await import("./sentry.server.config");
  }
 }
 // Capture unhandled promise rejections in Server Components / API routes and
 // forward them through Sentry's hook. Loaded only on the server.
 // Forward unhandled errors from Server Components / Route Handlers to Sentry.
 // Loose typing so it tracks any captureRequestError signature change in
 // @sentry/nextjs — observability code must not block real errors.
 export const onRequestError = async (
  err: unknown,
  request: Parameters<typeof import("@sentry/nextjs").captureRequestError>[1],
  context: Parameters<typeof import("@sentry/nextjs").captureRequestError>[2],
 ) => {
  if (process.env.NEXT_RUNTIME !== "nodejs") return;
  try {
    const { captureRequestError } = await import("@sentry/nextjs");
    await captureRequestError(err, request, context);
  } catch {
    /* never let observability swallow the original error */
  }
 };
--- a/web/lib/chat/claude-code.ts
+++ b/web/lib/chat/claude-code.ts
@ -12,7 +12,11 @@ import { spawn } from "node:child_process";
 import type { ChatProvider, ChatRequest, ChatResponse } from "./types";
 const MODEL = process.env.CLAUDE_CODE_MODEL || "haiku";
-const TIMEOUT_MS = 90_000;
+// W1-TD#30: subprocess timeout is now configurable. Default 90s matches the
 // previous hard-coded value. Lower it (e.g. 60s) when the provider should bail
 // out of slow generations sooner; raise it (e.g. 180s) when running heavier
 // models like opus on long contexts.
 const TIMEOUT_MS = Number(process.env.CLAUDE_CODE_TIMEOUT_MS || 90_000);
 function buildPrompt(req: ChatRequest): string {
  // Single-shot prompt: collapse history into a structured transcript.
--- a/web/lib/chat/openrouter.ts
+++ b/web/lib/chat/openrouter.ts
@ -23,6 +23,105 @@ const PRIMARY = process.env.OPENROUTER_MODEL || "deepseek/deepseek-v4-flash:free
 const FALLBACK = process.env.OPENROUTER_FALLBACK_MODEL || "nvidia/nemotron-3-super-120b-a12b:free";
 const ENDPOINT = "https://openrouter.ai/api/v1/chat/completions";
 // W1-TD#23: retry + circuit breaker for OpenRouter free-tier flakiness.
 // Transient errors (429/502/503/504/network) are retried up to RETRY_MAX times
 // with exponential backoff. Repeated PRIMARY failures within CB_WINDOW_MS
 // trip an in-memory circuit breaker that promotes FALLBACK as the active
 // model for CB_COOLDOWN_MS — protecting the chat from a single bad model.
 const RETRY_MAX = Number(process.env.OPENROUTER_RETRY_MAX || 2);
 const RETRY_BASE_MS = Number(process.env.OPENROUTER_RETRY_BASE_MS || 400);
 const CB_WINDOW_MS = Number(process.env.OPENROUTER_CB_WINDOW_MS || 60_000);
 const CB_THRESHOLD = Number(process.env.OPENROUTER_CB_THRESHOLD || 3);
 const CB_COOLDOWN_MS = Number(process.env.OPENROUTER_CB_COOLDOWN_MS || 120_000);
 const RETRYABLE_STATUSES = new Set([408, 425, 429, 500, 502, 503, 504]);
 interface ModelBreaker { failures: number[]; openedAt: number | null }
 const breakers = new Map<string, ModelBreaker>();
 function breakerFor(model: string): ModelBreaker {
  let b = breakers.get(model);
  if (!b) { b = { failures: [], openedAt: null }; breakers.set(model, b); }
  return b;
 }
 function isCircuitOpen(model: string): boolean {
  const b = breakerFor(model);
  if (!b.openedAt) return false;
  if (Date.now() - b.openedAt > CB_COOLDOWN_MS) {
    // Half-open: clear and let the next call probe the upstream.
    b.openedAt = null; b.failures = [];
    return false;
  }
  return true;
 }
 function recordFailure(model: string): void {
  const b = breakerFor(model);
  const now = Date.now();
  b.failures = b.failures.filter((t) => now - t < CB_WINDOW_MS);
  b.failures.push(now);
  if (b.failures.length >= CB_THRESHOLD) b.openedAt = now;
 }
 function recordSuccess(model: string): void {
  const b = breakerFor(model);
  b.failures = []; b.openedAt = null;
 }
 /** Pick the active model honoring an open circuit on PRIMARY. */
 function pickModel(preferred: string): string {
  if (preferred === PRIMARY && isCircuitOpen(PRIMARY)) return FALLBACK;
  return preferred;
 }
 /** Fetch wrapper with retry + breaker accounting. */
 async function fetchOpenRouter(
  body: Record<string, unknown>,
  preferredModel: string,
 ): Promise<{ res: Response; model: string }> {
  const model = pickModel(preferredModel);
  body.model = model;
  let lastErr: unknown;
  for (let attempt = 0; attempt <= RETRY_MAX; attempt++) {
    try {
      const res = await fetch(ENDPOINT, {
        method: "POST",
        headers: headers(),
        body: JSON.stringify(body),
      });
      if (res.ok) {
        recordSuccess(model);
        return { res, model };
      }
      if (!RETRYABLE_STATUSES.has(res.status)) {
        const txt = await res.text();
        const err = new Error(`openrouter HTTP ${res.status}: ${txt.slice(0, 300)}`);
        if (res.status === 429 || res.status === 402) {
          (err as Error & { isRateLimit?: boolean }).isRateLimit = true;
        }
        recordFailure(model);
        throw err;
      }
      // Retryable — wait with exponential backoff, honor Retry-After if present.
      const ra = Number(res.headers.get("retry-after"));
      const waitMs = Number.isFinite(ra) && ra > 0
        ? ra * 1000
        : RETRY_BASE_MS * Math.pow(2, attempt);
      await new Promise((r) => setTimeout(r, waitMs));
      lastErr = new Error(`openrouter HTTP ${res.status} (attempt ${attempt + 1}/${RETRY_MAX + 1})`);
    } catch (e) {
      // Network/abort — also retry up to RETRY_MAX.
      lastErr = e;
      if (attempt >= RETRY_MAX) break;
      await new Promise((r) => setTimeout(r, RETRY_BASE_MS * Math.pow(2, attempt)));
    }
  }
  recordFailure(model);
  throw lastErr instanceof Error ? lastErr : new Error(String(lastErr));
 }
 type OAMsg =
  | { role: "system" | "user"; content: string }
  | { role: "assistant"; content?: string | null; tool_calls?: OAToolCall[] }
@ -74,35 +173,26 @@ export interface SendOnceReq {
 }
 /** Non-streaming single shot — used by claude-code fallback path and tests. */
-export async function sendOnce(req: SendOnceReq, model = PRIMARY): Promise<{
+export async function sendOnce(req: SendOnceReq, preferredModel = PRIMARY): Promise<{
  content: string;
  model: string;
  tokensIn?: number;
  tokensOut?: number;
 }> {
-  const body = {
+  const body: Record<string, unknown> = {
    model,
    messages: [
      { role: "system", content: req.system },
      ...req.messages.slice(-20),
    ],
    max_tokens: req.maxTokens ?? 1024,
  };
-  const res = await fetch(ENDPOINT, {
+  const { res, model } = await fetchOpenRouter(body, preferredModel);
    method: "POST",
    headers: headers(),
    body: JSON.stringify(body),
  });
  if (!res.ok) {
    const txt = await res.text();
    const err = new Error(`openrouter HTTP ${res.status}: ${txt.slice(0, 300)}`);
    if (res.status === 429 || res.status === 402) {
      (err as Error & { isRateLimit?: boolean }).isRateLimit = true;
    }
    throw err;
  }
  const data = await res.json();
-  if (data.error) throw new Error(`openrouter error: ${data.error.message}`);
+  if (data.error) {
    recordFailure(model);
    throw new Error(`openrouter error: ${data.error.message}`);
  }
  recordSuccess(model);
  return {
    content: data.choices?.[0]?.message?.content ?? "",
    model: data.model ?? model,
@ -336,12 +426,11 @@ export async function streamWithTools(
 async function openrouterStreamCall(
  messages: OAMsg[],
-  model: string,
+  preferredModel: string,
  opts: { withTools?: boolean } = {},
 ): Promise<Response> {
  const withTools = opts.withTools !== false;
  const body: Record<string, unknown> = {
    model,
    messages,
    stream: true,
    max_tokens: 1024,
@ -350,19 +439,7 @@ async function openrouterStreamCall(
    body.tools = TOOL_DEFINITIONS;
    body.tool_choice = "auto";
  }
-  const res = await fetch(ENDPOINT, {
+  const { res } = await fetchOpenRouter(body, preferredModel);
    method: "POST",
    headers: headers(),
    body: JSON.stringify(body),
  });
  if (!res.ok) {
    const txt = await res.text();
    const err = new Error(`openrouter HTTP ${res.status}: ${txt.slice(0, 300)}`);
    if (res.status === 429 || res.status === 402) {
      (err as Error & { isRateLimit?: boolean }).isRateLimit = true;
    }
    throw err;
  }
  return res;
 }
--- a/web/lib/logger.ts
+++ b/web/lib/logger.ts
@ -0,0 +1,77 @@
 /**
 * Structured logger — pino with JSON output in production, pretty in dev.
 *
 * Use as:
 *   import { log, withRequest } from "@/lib/logger";
 *   log.info({ doc_id, page }, "rendering page");
 *   log.error({ err }, "embed-service down");
 *
 * For request-scoped logging:
 *   const reqLog = withRequest(request);
 *   reqLog.info({ duration_ms: dt }, "hybrid_search done");
 *
 * Edge runtime falls back to a console adapter (pino requires node).
 */
 import pino from "pino";
 // Edge runtime doesn't support pino's worker thread; detect and fall back.
 const isEdge = typeof process === "undefined" || process.env.NEXT_RUNTIME === "edge";
 function build(): pino.Logger {
  if (isEdge) {
    // Minimal adapter so middleware can call log.* without crashing.
    const noop = () => undefined;
    return {
      info: (o: unknown, m?: string) => console.log(JSON.stringify({ level: "info", msg: m, ...(typeof o === "object" ? o : { v: o }) })),
      warn: (o: unknown, m?: string) => console.warn(JSON.stringify({ level: "warn", msg: m, ...(typeof o === "object" ? o : { v: o }) })),
      error: (o: unknown, m?: string) => console.error(JSON.stringify({ level: "error", msg: m, ...(typeof o === "object" ? o : { v: o }) })),
      debug: noop,
      trace: noop,
      fatal: (o: unknown, m?: string) => console.error(JSON.stringify({ level: "fatal", msg: m, ...(typeof o === "object" ? o : { v: o }) })),
      child: () => build(),
    } as unknown as pino.Logger;
  }
  return pino({
    level: process.env.LOG_LEVEL || "info",
    base: {
      app: "disclosure-web",
      env: process.env.NODE_ENV || "development",
    },
    timestamp: pino.stdTimeFunctions.isoTime,
    // Production: NDJSON (one JSON per line). Dev: pretty-printed.
    transport: process.env.NODE_ENV === "production" ? undefined : {
      target: "pino-pretty",
      options: { colorize: true, translateTime: "SYS:HH:MM:ss.l" },
    },
  });
 }
 export const log: pino.Logger = build();
 /** Create a child logger bound to a request's correlation id. */
 export function withRequest(req: Request | { headers: Headers }): pino.Logger {
  const id = req.headers.get("x-correlation-id") ||
             req.headers.get("x-request-id") ||
             cryptoRandomId();
  return log.child({ correlation_id: id });
 }
 /** Get-or-mint a correlation id for a request. */
 export function correlationId(req: Request | { headers: Headers }): string {
  return req.headers.get("x-correlation-id") ||
         req.headers.get("x-request-id") ||
         cryptoRandomId();
 }
 function cryptoRandomId(): string {
  // 16 hex chars — short enough for logs, enough entropy for non-security uses.
  // Both edge runtime and Node 19+ expose globalThis.crypto; older Node falls
  // back to Math.random (acceptable: this is correlation, not security).
  const g = globalThis as { crypto?: { getRandomValues?: (a: Uint8Array) => void } };
  if (g.crypto?.getRandomValues) {
    const buf = new Uint8Array(8);
    g.crypto.getRandomValues(buf);
    return Array.from(buf, (b) => b.toString(16).padStart(2, "0")).join("");
  }
  return Math.random().toString(36).slice(2, 18);
 }
--- a/web/middleware.ts
+++ b/web/middleware.ts
@ -6,12 +6,17 @@
 */
 import { NextResponse, type NextRequest } from "next/server";
 import { createServerClient, type CookieOptions } from "@supabase/ssr";
 import { log, correlationId } from "@/lib/logger";
 export async function middleware(request: NextRequest) {
  const t0 = Date.now();
  const url = process.env.NEXT_PUBLIC_SUPABASE_URL;
  const key = process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY;
  const reqId = correlationId(request);
  let response = NextResponse.next({ request });
  // Stamp every response so downstream handlers and the client see the same id.
  response.headers.set("x-correlation-id", reqId);
  if (!url || !key) {
    // Supabase not configured — skip auth refresh entirely
@ -34,10 +39,11 @@ export async function middleware(request: NextRequest) {
  // Trigger refresh (silently if token still valid)
  const { data: { user } } = await supabase.auth.getUser();
-  // Gate /admin/* by role. Non-admin (including anonymous) gets the public
+  // Gate /admin/* AND /api/admin/* by role. Non-admin (including anonymous)
-  // 404, not a redirect — we don't want to leak the existence of the route.
+  // gets a public 404, not a redirect — we don't want to leak the existence
  // of the route. (Audit W0-F1 — fechado 2026-05-23.)
  const pathname = request.nextUrl.pathname;
-  if (pathname.startsWith("/admin")) {
+  if (pathname.startsWith("/admin") || pathname.startsWith("/api/admin")) {
    if (!user) {
      return new NextResponse("Not Found", { status: 404 });
    }
@ -51,6 +57,22 @@ export async function middleware(request: NextRequest) {
    }
  }
  // Log API requests with correlation id + timing. Skip noisy paths (assets,
  // crops) and prefer one structured line per request so Glitchtip / log
  // aggregators can correlate.
  if (pathname.startsWith("/api/") && !pathname.startsWith("/api/static") && !pathname.startsWith("/api/crop")) {
    log.info(
      {
        event: "http_request",
        method: request.method,
        path: pathname,
        correlation_id: reqId,
        duration_ms: Date.now() - t0,
      },
      `${request.method} ${pathname}`,
    );
  }
  return response;
 }
--- a/web/package-lock.json
+++ b/web/package-lock.json
--- a/web/package.json
+++ b/web/package.json
@ -15,6 +15,7 @@
    "@radix-ui/react-tooltip": "^1.1.0",
    "@react-sigma/core": "^5.0.0",
    "@react-sigma/layout-forceatlas2": "^5.0.0",
    "@sentry/nextjs": "^10.53.1",
    "@supabase/ssr": "^0.10.3",
    "@supabase/supabase-js": "^2.105.4",
    "framer-motion": "^11.11.0",
@ -24,6 +25,7 @@
    "lucide-react": "^0.460.0",
    "next": "^15.1.0",
    "pg": "^8.13.1",
    "pino": "^10.3.1",
    "react": "^19.0.0",
    "react-dom": "^19.0.0",
    "react-force-graph-2d": "^1.27.0",
--- a/web/sentry.client.config.ts
+++ b/web/sentry.client.config.ts
@ -0,0 +1,17 @@
 /**
 * Sentry (Glitchtip-compatible) client-side init. Loaded by Next.js
 * automatically when @sentry/nextjs is installed.
 */
 import * as Sentry from "@sentry/nextjs";
 const dsn = process.env.NEXT_PUBLIC_SENTRY_DSN;
 if (dsn) {
  Sentry.init({
    dsn,
    environment: process.env.NODE_ENV || "development",
    tracesSampleRate: 0,
    sendDefaultPii: false,
    // Capture unhandled promise rejections + JS errors. Glitchtip community
    // ignores everything below `error` severity by default.
  });
 }
--- a/web/sentry.server.config.ts
+++ b/web/sentry.server.config.ts
@ -0,0 +1,21 @@
 /**
 * Sentry (Glitchtip-compatible) server-side init.
 *
 * DSN must point to Glitchtip — we never send to sentry.io. See
 * SENTRY_DSN / NEXT_PUBLIC_SENTRY_DSN in docker-compose.yml. If unset, the SDK
 * is loaded but no events ship — safe for local dev.
 */
 import * as Sentry from "@sentry/nextjs";
 const dsn = process.env.SENTRY_DSN || process.env.NEXT_PUBLIC_SENTRY_DSN;
 if (dsn) {
  Sentry.init({
    dsn,
    environment: process.env.NODE_ENV || "development",
    release: process.env.SENTRY_RELEASE,
    tracesSampleRate: 0,  // Glitchtip community doesn't support performance traces
    sendDefaultPii: false,
    // Make sure events land on Glitchtip's tunnel-friendly DSN host, not
    // sentry.io. The SDK already infers from DSN; this is just defensive.
  });
 }