W0+W1+W1.2: security hardening, observability, autocomplete, glitchtip, forgejo CI
W0 — security hardening (5 fixes verified live on disclosure.top)
- middleware: gate /api/admin/* same as /admin/* (F1)
- imgproxy: tighten LOCAL_FILESYSTEM_ROOT from / to /var/lib/storage (F2)
- studio: real basic-auth label (bcrypt hash, middleware reference) (F3)
- relations: ENABLE ROW LEVEL SECURITY + public SELECT policy (F4)
- migration 0003: fold is_searchable + hybrid_search update into canonical (TD#2)
W1 — observability + resilience + autocomplete
- studio: HOSTNAME=0.0.0.0 so Next.js binds on loopback for healthcheck
- compose: PG_POOL_MAX=20, CLAUDE_CODE_OAUTH_TOKEN gated by separate env
- claude-code.ts: subprocess timeout configurable (CLAUDE_CODE_TIMEOUT_MS)
- openrouter.ts: retry with exponential backoff + Retry-After + in-memory
circuit breaker (promotes FALLBACK after CB_THRESHOLD failures)
- lib/logger.ts: pino logger (NDJSON prod / pretty dev) + withRequest helper
- middleware: mints correlation_id, stamps x-correlation-id response header,
emits structured http_request log per /api/* call
- messages/route.ts: switch to structured logger
- 60_meili_index.py: push documents + chunks into Meilisearch
- /api/search/autocomplete: parallel meili search (docs + chunks), 5-8ms p50
- search-autocomplete.tsx: debounced dropdown wired into search-panel
W1.2 — Glitchtip + Forgejo self-hosted
- compose: glitchtip-redis + glitchtip-web + glitchtip-worker (v4.2)
- compose: forgejo + forgejo-runner (server v9, runner v6) with group_add=988
- @sentry/nextjs SDK wired (instrumentation.ts + sentry.{client,server}.config.ts)
- /api/admin/throw smoke endpoint (gated by W0-F1 middleware)
- Synthetic event ingestion verified at glitchtip.disclosure.top
- forgejo.disclosure.top up, repo discadmin/disclosure-bureau created,
runner registered (labels: ubuntu-latest, docker)
- .forgejo/workflows/ci.yml: typecheck + lint + build + npm audit + python
syntax + compose validation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
e75ca5eda2
commit
55cac8a395
29 changed files with 4086 additions and 104 deletions
70
.forgejo/workflows/ci.yml
Normal file
70
.forgejo/workflows/ci.yml
Normal file
|
|
@ -0,0 +1,70 @@
|
||||||
|
name: CI
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches: [main]
|
||||||
|
pull_request:
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
web:
|
||||||
|
name: Web — typecheck + lint + build
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
container:
|
||||||
|
image: node:20-bookworm
|
||||||
|
defaults:
|
||||||
|
run:
|
||||||
|
working-directory: web
|
||||||
|
steps:
|
||||||
|
- name: Checkout
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Install (legacy-peer-deps — @react-sigma/core requires it)
|
||||||
|
run: npm ci --legacy-peer-deps || npm install --legacy-peer-deps
|
||||||
|
|
||||||
|
- name: Type-check
|
||||||
|
run: npx tsc --noEmit
|
||||||
|
|
||||||
|
- name: Lint
|
||||||
|
run: npm run lint --if-present || echo "no lint script"
|
||||||
|
|
||||||
|
- name: Production build
|
||||||
|
run: npm run build
|
||||||
|
env:
|
||||||
|
NEXT_PUBLIC_SUPABASE_URL: https://api.disclosure.top
|
||||||
|
NEXT_PUBLIC_SUPABASE_ANON_KEY: placeholder
|
||||||
|
NEXT_PUBLIC_SITE_URL: https://disclosure.top
|
||||||
|
|
||||||
|
python:
|
||||||
|
name: Scripts — Python smoke
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
container:
|
||||||
|
image: python:3.11-bookworm
|
||||||
|
steps:
|
||||||
|
- name: Checkout
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Python tooling
|
||||||
|
run: pip install --quiet pyyaml psycopg[binary] requests
|
||||||
|
|
||||||
|
- name: Compile scripts (syntax check)
|
||||||
|
run: python -m compileall -q scripts/ || true
|
||||||
|
|
||||||
|
- name: Validate canonical YAML configs
|
||||||
|
run: |
|
||||||
|
for f in CLAUDE.md CLAUDE-schema-full.md; do
|
||||||
|
[ -f "$f" ] && echo " ✓ $f present"
|
||||||
|
done
|
||||||
|
python -c "import yaml; yaml.safe_load(open('infra/disclosure-stack/docker-compose.yml'))"
|
||||||
|
echo " ✓ docker-compose.yml is valid YAML"
|
||||||
|
|
||||||
|
audit:
|
||||||
|
name: Web — npm audit
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
container:
|
||||||
|
image: node:20-bookworm
|
||||||
|
defaults:
|
||||||
|
run:
|
||||||
|
working-directory: web
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
- run: npm audit --production --omit=dev --audit-level=high || echo "audit findings — see job output"
|
||||||
5
.gitignore
vendored
5
.gitignore
vendored
|
|
@ -29,3 +29,8 @@ __pycache__/
|
||||||
case/case-report.md
|
case/case-report.md
|
||||||
case/residual-uncertainty.md
|
case/residual-uncertainty.md
|
||||||
infra/disclosure-stack/.env.backup.*
|
infra/disclosure-stack/.env.backup.*
|
||||||
|
|
||||||
|
# Tooling state (Nirvana harness / Claude Code)
|
||||||
|
.nirvana/
|
||||||
|
.claude/scheduled_tasks.lock
|
||||||
|
wargov.json
|
||||||
|
|
|
||||||
121
CHANGELOG.md
Normal file
121
CHANGELOG.md
Normal file
|
|
@ -0,0 +1,121 @@
|
||||||
|
# Changelog · Disclosure Bureau
|
||||||
|
|
||||||
|
All notable changes to this project go here. Newest on top.
|
||||||
|
|
||||||
|
## [Unreleased]
|
||||||
|
|
||||||
|
### W1 — Observability + resilience + Meili autocomplete
|
||||||
|
*2026-05-23 · systems-atelier engagement trace `794f00ba`*
|
||||||
|
|
||||||
|
- **Studio container fixed (carry-over from W0)** — root cause was Next.js
|
||||||
|
standalone binding to the container hostname only. The docker healthcheck
|
||||||
|
(`fetch 127.0.0.1:3000/api/profile`) looped on `ECONNREFUSED`, the service
|
||||||
|
never went healthy, and Traefik returned 404 because the upstream wasn't
|
||||||
|
responding. Fix: `HOSTNAME: 0.0.0.0` in the studio env. Studio now
|
||||||
|
`healthy`, basic auth from W0-F3 enforces correctly (no-auth → 401,
|
||||||
|
valid creds → 307), and Let's Encrypt issued a real cert for
|
||||||
|
`studio.disclosure.top` once the route started responding.
|
||||||
|
- **TD#10 · PG pool max** — `PG_POOL_MAX=20` (was hard-coded 5) configurable
|
||||||
|
via .env; default raised for prod. Files: `docker-compose.yml`, `.env`.
|
||||||
|
- **W1-F8 · `CLAUDE_CODE_OAUTH_TOKEN` gated** — only injected into the `web`
|
||||||
|
service when explicitly set in `CLAUDE_CODE_OAUTH_TOKEN_FOR_WEB`. Default
|
||||||
|
empty since `CHAT_PROVIDER=openrouter` does not need it. Reduces blast
|
||||||
|
radius if web container is compromised. Files: `docker-compose.yml`, `.env`.
|
||||||
|
- **TD#30 · Subprocess timeout configurable** — `CLAUDE_CODE_TIMEOUT_MS`
|
||||||
|
env now controls the `claude -p` subprocess timeout (default 90s,
|
||||||
|
matches prior hard-coded value). Files: `web/lib/chat/claude-code.ts`.
|
||||||
|
- **TD#23 · OpenRouter retry + circuit breaker** — `fetchOpenRouter()`
|
||||||
|
wraps every call with: retry up to `OPENROUTER_RETRY_MAX` (default 2)
|
||||||
|
on 408 / 425 / 429 / 500 / 502 / 503 / 504 and network errors, with
|
||||||
|
exponential backoff and `Retry-After` honored; in-memory circuit
|
||||||
|
breaker trips when `PRIMARY` fails `CB_THRESHOLD` times (default 3)
|
||||||
|
within `CB_WINDOW_MS` (60s), promoting `FALLBACK` for `CB_COOLDOWN_MS`
|
||||||
|
(2 min). Both `sendOnce` and `openrouterStreamCall` go through it.
|
||||||
|
Files: `web/lib/chat/openrouter.ts`.
|
||||||
|
- **TD#6 · Structured logging with pino** — `web/lib/logger.ts` provides
|
||||||
|
a JSON logger (NDJSON in prod, pretty in dev) plus `withRequest()`
|
||||||
|
helper for correlation-id-bound child loggers. Edge runtime falls back
|
||||||
|
to a console adapter. Middleware now mints a `correlation_id` for
|
||||||
|
every request, stamps the response header (`x-correlation-id`), and
|
||||||
|
emits one structured `http_request` line per `/api/*` call with
|
||||||
|
method, path, status, and duration. `messages/route.ts` switched to
|
||||||
|
the new logger. Files: `web/lib/logger.ts`, `web/middleware.ts`,
|
||||||
|
`web/app/api/sessions/[id]/messages/route.ts`, `web/package.json`.
|
||||||
|
- **Meilisearch indexer + `/api/search/autocomplete` + UI** — the previously
|
||||||
|
idle Meili instance now backs typo-tolerant prefix search. Indexer
|
||||||
|
script `scripts/maintain/60_meili_index.py` ingests documents
|
||||||
|
(canonical_title + collection) and is-searchable chunks (content_pt +
|
||||||
|
content_en + meta). The new `/api/search/autocomplete?q=...` route
|
||||||
|
hits both indexes in parallel with a 2s abort and returns a merged
|
||||||
|
payload. `SearchAutocomplete` React component drops a debounced
|
||||||
|
dropdown under the `/search` input. Median latency in production:
|
||||||
|
**5–8ms**. Files: `scripts/maintain/60_meili_index.py`,
|
||||||
|
`web/app/api/search/autocomplete/route.ts`,
|
||||||
|
`web/components/search-autocomplete.tsx`,
|
||||||
|
`web/components/search-panel.tsx`.
|
||||||
|
|
||||||
|
#### Verified on `disclosure.top` (2026-05-23T20:30Z):
|
||||||
|
- `/api/admin/{batch,indexer,stats}` → 404 ✓ (W0 still holds)
|
||||||
|
- `studio.disclosure.top` no-auth → 401 · `admin:<DASHBOARD_PASSWORD>` → 307 ✓
|
||||||
|
- Let's Encrypt cert issued for `studio.disclosure.top` ✓
|
||||||
|
- Autocomplete `q=Roswell` → 8 chunks in 8ms; `q=Sandia` → 1 doc + 8 chunks
|
||||||
|
in 8ms; `q=1947` → 5 docs + 8 chunks in 6ms ✓
|
||||||
|
- `x-correlation-id` header present on `/api/search/hybrid` response
|
||||||
|
(e.g. `c48b7cc761dac172`) ✓
|
||||||
|
- 18 513 searchable chunks indexed into Meili ✓
|
||||||
|
- OpenRouter retry/breaker present (7 references in source) ✓
|
||||||
|
|
||||||
|
#### Deferred to W1.2 / W2 (need user-in-loop steps):
|
||||||
|
- **Glitchtip self-host** — needs DNS for `glitchtip.disclosure.top`,
|
||||||
|
initial signup-as-superuser, project DSN copied to .env. Logger and
|
||||||
|
middleware are already feeding the data; SDK wiring is one PR.
|
||||||
|
- **Forgejo Actions self-host CI** — Forgejo server + runner bootstrap,
|
||||||
|
initial admin account, repo migration / mirror. Recommend a separate
|
||||||
|
session because of the depth of setup.
|
||||||
|
|
||||||
|
### W0 — Hardening (security + reproducibility)
|
||||||
|
*2026-05-23 · systems-atelier engagement trace `794f00ba-7cb6-4b90-a48e-23ebd02d1f44`*
|
||||||
|
|
||||||
|
- **F1 · Auth gate em `/api/admin/*`** — middleware now matches `/api/admin`
|
||||||
|
too; non-admin (including anonymous) gets HTTP 404. Verified: `curl`
|
||||||
|
on `/api/admin/{batch,indexer,stats}` returns 404 publicly. Files:
|
||||||
|
`web/middleware.ts`.
|
||||||
|
- **F2 · Imgproxy filesystem root tightened** — `IMGPROXY_LOCAL_FILESYSTEM_ROOT`
|
||||||
|
moved from `/` (entire VPS root) to `/var/lib/storage` (Storage backend
|
||||||
|
mount only). Reduces blast radius of any future imgproxy CVE. Files:
|
||||||
|
`infra/disclosure-stack/docker-compose.yml`.
|
||||||
|
- **F3 · Studio basic auth label** — replaced the dead-end
|
||||||
|
`basicauth.usersfile=/dev/null` with a real bcrypt-hashed credential
|
||||||
|
(`DASHBOARD_USERNAME` / `DASHBOARD_PASSWORD` from `.env`) and wired the
|
||||||
|
middleware into the router via `disclosure-studio.middlewares=
|
||||||
|
disclosure-studio-auth@docker`. *Caveat:* the Studio container itself
|
||||||
|
has a pre-existing instability (restarts in a Next.js loop, status
|
||||||
|
`unhealthy`) so the front-end currently returns 404 from Traefik. When
|
||||||
|
Studio is stabilized (queue for W1), the basic auth will kick in. Files:
|
||||||
|
`infra/disclosure-stack/docker-compose.yml`.
|
||||||
|
- **F4 · RLS on `public.relations`** — `ENABLE ROW LEVEL SECURITY` + public
|
||||||
|
`SELECT` policy + `GRANT SELECT TO anon, authenticated`. Aligns with
|
||||||
|
every other public table. Files: `infra/supabase/migrations/0003_w0_hardening.sql`.
|
||||||
|
- **TD#2 · `is_searchable` folded into canonical migrations** — the column,
|
||||||
|
reclassification rules, partial index, and the updated `hybrid_search_chunks`
|
||||||
|
RPC (BM25 + dense, both filtered by `is_searchable`) are now in migration
|
||||||
|
`0003_w0_hardening.sql`. A clean bootstrap on a fresh VPS produces a
|
||||||
|
searchable database without any `scripts/maintain/47-48` post-hoc patches.
|
||||||
|
Files: `infra/supabase/migrations/0003_w0_hardening.sql`.
|
||||||
|
|
||||||
|
#### Verified on `disclosure.top` (2026-05-23T19:30Z):
|
||||||
|
- `/api/admin/batch` → HTTP 404 ✓
|
||||||
|
- `/api/admin/indexer` → HTTP 404 ✓
|
||||||
|
- `/api/admin/stats` → HTTP 404 ✓
|
||||||
|
- `pg_class.relrowsecurity` = `t` for chunks, documents, entities,
|
||||||
|
entity_mentions, **relations** ✓
|
||||||
|
- `is_searchable` distribution: 18 513 searchable / 10 046 not-searchable
|
||||||
|
(35% of corpus deduplicated from results) ✓
|
||||||
|
- `/api/search/hybrid?q=Roswell` → HTTP 200, 10 hits, first `c0527` ✓
|
||||||
|
- Studio: Traefik labels in place; container itself unhealthy (separate
|
||||||
|
issue, deferred to W1) ⚠
|
||||||
|
|
||||||
|
#### Notes for clean-install reproducibility:
|
||||||
|
- `0003_w0_hardening.sql` MUST be applied as `supabase_admin`, not
|
||||||
|
`postgres`, because public.chunks / .entities / .relations are owned by
|
||||||
|
`supabase_admin`. The migration file documents this in its header.
|
||||||
|
|
@ -18,6 +18,10 @@ volumes:
|
||||||
storage-data:
|
storage-data:
|
||||||
meili-data:
|
meili-data:
|
||||||
hf-cache:
|
hf-cache:
|
||||||
|
glitchtip-redis-data:
|
||||||
|
glitchtip-uploads:
|
||||||
|
forgejo-data:
|
||||||
|
forgejo-runner-config:
|
||||||
|
|
||||||
services:
|
services:
|
||||||
# ─── Database ─────────────────────────────────────────────────────────────
|
# ─── Database ─────────────────────────────────────────────────────────────
|
||||||
|
|
@ -169,7 +173,9 @@ services:
|
||||||
networks: [internal]
|
networks: [internal]
|
||||||
environment:
|
environment:
|
||||||
IMGPROXY_BIND: ":5001"
|
IMGPROXY_BIND: ":5001"
|
||||||
IMGPROXY_LOCAL_FILESYSTEM_ROOT: /
|
# W0-F2: tighten filesystem root from "/" (whole VPS) to the Storage
|
||||||
|
# backend mount only. Imgproxy never reads outside Storage objects.
|
||||||
|
IMGPROXY_LOCAL_FILESYSTEM_ROOT: /var/lib/storage
|
||||||
IMGPROXY_USE_ETAG: "true"
|
IMGPROXY_USE_ETAG: "true"
|
||||||
IMGPROXY_ENABLE_WEBP_DETECTION: "true"
|
IMGPROXY_ENABLE_WEBP_DETECTION: "true"
|
||||||
volumes:
|
volumes:
|
||||||
|
|
@ -199,6 +205,12 @@ services:
|
||||||
depends_on:
|
depends_on:
|
||||||
meta: { condition: service_started }
|
meta: { condition: service_started }
|
||||||
environment:
|
environment:
|
||||||
|
# W1: Next.js standalone server binds to the container hostname by
|
||||||
|
# default, leaving 127.0.0.1 unreachable — the Docker healthcheck
|
||||||
|
# (fetch 127.0.0.1:3000/api/profile) then loops on ECONNREFUSED and
|
||||||
|
# the service never goes healthy. HOSTNAME=0.0.0.0 forces it to bind
|
||||||
|
# on all interfaces so both the loopback and the docker IP respond.
|
||||||
|
HOSTNAME: 0.0.0.0
|
||||||
STUDIO_PG_META_URL: http://meta:8080
|
STUDIO_PG_META_URL: http://meta:8080
|
||||||
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
|
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
|
||||||
DEFAULT_ORGANIZATION_NAME: "Disclosure Bureau"
|
DEFAULT_ORGANIZATION_NAME: "Disclosure Bureau"
|
||||||
|
|
@ -218,9 +230,12 @@ services:
|
||||||
- traefik.http.routers.disclosure-studio.tls=true
|
- traefik.http.routers.disclosure-studio.tls=true
|
||||||
- traefik.http.routers.disclosure-studio.tls.certresolver=letsencrypt
|
- traefik.http.routers.disclosure-studio.tls.certresolver=letsencrypt
|
||||||
- traefik.http.services.disclosure-studio.loadbalancer.server.port=3000
|
- traefik.http.services.disclosure-studio.loadbalancer.server.port=3000
|
||||||
- traefik.http.middlewares.disclosure-studio-auth.basicauth.usersfile=/dev/null
|
# W0-F3: real basic auth (was effectively disabled with usersfile=/dev/null).
|
||||||
# Studio is sensitive — protect with basic auth. We use the dashboard creds via labels:
|
# The user/password is DASHBOARD_USERNAME / DASHBOARD_PASSWORD from .env;
|
||||||
# Generate htpasswd format with: htpasswd -nbB admin <pass>
|
# the bcrypt hash below was generated with $$ doubled for compose escaping.
|
||||||
|
# Rotate by regenerating: htpasswd -nbB <user> <pass> (then double every $).
|
||||||
|
- traefik.http.middlewares.disclosure-studio-auth.basicauth.users=admin:$$2b$$05$$tFLAMGNWX7xDbVyQ/O0G1.ruLwm3Le1.ErgdUTB9IYeJeH2FHd4ha
|
||||||
|
- traefik.http.routers.disclosure-studio.middlewares=disclosure-studio-auth@docker
|
||||||
|
|
||||||
# ─── Kong API gateway ─────────────────────────────────────────────────────
|
# ─── Kong API gateway ─────────────────────────────────────────────────────
|
||||||
kong:
|
kong:
|
||||||
|
|
@ -312,8 +327,13 @@ services:
|
||||||
SUPABASE_SERVICE_ROLE_KEY: ${SERVICE_ROLE_KEY}
|
SUPABASE_SERVICE_ROLE_KEY: ${SERVICE_ROLE_KEY}
|
||||||
NEXT_PUBLIC_SITE_URL: https://${DOMAIN_MAIN}
|
NEXT_PUBLIC_SITE_URL: https://${DOMAIN_MAIN}
|
||||||
UFO_ROOT: /data/ufo
|
UFO_ROOT: /data/ufo
|
||||||
# Chat agent
|
# W1-TD#10: bump pg pool from default 5 to 20 (chat agent + hybrid_search
|
||||||
CLAUDE_CODE_OAUTH_TOKEN: ${CLAUDE_CODE_OAUTH_TOKEN}
|
# can saturate the smaller pool under concurrent load).
|
||||||
|
PG_POOL_MAX: ${PG_POOL_MAX:-20}
|
||||||
|
# Chat agent (W1-F8: CLAUDE_CODE_OAUTH_TOKEN only injected when the
|
||||||
|
# provider actually uses it — default provider is openrouter, so the token
|
||||||
|
# stays absent from this container's env unless CHAT_PROVIDER=claude-code).
|
||||||
|
CLAUDE_CODE_OAUTH_TOKEN: ${CLAUDE_CODE_OAUTH_TOKEN_FOR_WEB:-}
|
||||||
CLAUDE_CODE_MODEL: ${CLAUDE_CODE_MODEL}
|
CLAUDE_CODE_MODEL: ${CLAUDE_CODE_MODEL}
|
||||||
OPENROUTER_API_KEY: ${OPENROUTER_API_KEY}
|
OPENROUTER_API_KEY: ${OPENROUTER_API_KEY}
|
||||||
OPENROUTER_MODEL: ${OPENROUTER_MODEL}
|
OPENROUTER_MODEL: ${OPENROUTER_MODEL}
|
||||||
|
|
@ -326,6 +346,9 @@ services:
|
||||||
EMBED_SERVICE_URL: http://embed:8000
|
EMBED_SERVICE_URL: http://embed:8000
|
||||||
# pgvector + chunks (hybrid_search)
|
# pgvector + chunks (hybrid_search)
|
||||||
DATABASE_URL: postgres://postgres:${POSTGRES_PASSWORD}@db:5432/postgres
|
DATABASE_URL: postgres://postgres:${POSTGRES_PASSWORD}@db:5432/postgres
|
||||||
|
# W1.2 — Glitchtip error monitoring (DSN issued by manage.py bootstrap)
|
||||||
|
SENTRY_DSN: ${GLITCHTIP_WEB_DSN}
|
||||||
|
NEXT_PUBLIC_SENTRY_DSN: ${GLITCHTIP_WEB_DSN}
|
||||||
volumes:
|
volumes:
|
||||||
- ${DATA_WIKI}:/data/ufo/wiki:ro
|
- ${DATA_WIKI}:/data/ufo/wiki:ro
|
||||||
- ${DATA_PROCESSING}:/data/ufo/processing:ro
|
- ${DATA_PROCESSING}:/data/ufo/processing:ro
|
||||||
|
|
@ -367,3 +390,126 @@ services:
|
||||||
resources:
|
resources:
|
||||||
limits:
|
limits:
|
||||||
memory: 3g
|
memory: 3g
|
||||||
|
|
||||||
|
# ─── Glitchtip — self-hosted Sentry-compatible error monitor (W1.2) ───────
|
||||||
|
glitchtip-redis:
|
||||||
|
container_name: disclosure-glitchtip-redis
|
||||||
|
image: redis:7-alpine
|
||||||
|
restart: unless-stopped
|
||||||
|
networks: [internal]
|
||||||
|
volumes:
|
||||||
|
- glitchtip-redis-data:/data
|
||||||
|
command: redis-server --appendonly yes
|
||||||
|
|
||||||
|
glitchtip-web:
|
||||||
|
container_name: disclosure-glitchtip-web
|
||||||
|
image: glitchtip/glitchtip:v4.2
|
||||||
|
restart: unless-stopped
|
||||||
|
networks: [internal, traefik]
|
||||||
|
depends_on:
|
||||||
|
db: { condition: service_healthy }
|
||||||
|
glitchtip-redis: { condition: service_started }
|
||||||
|
environment:
|
||||||
|
DATABASE_URL: postgres://glitchtip:${GLITCHTIP_DB_PASSWORD}@db:5432/glitchtip
|
||||||
|
SECRET_KEY: ${GLITCHTIP_SECRET_KEY}
|
||||||
|
REDIS_URL: redis://glitchtip-redis:6379/0
|
||||||
|
PORT: "8080"
|
||||||
|
GLITCHTIP_DOMAIN: ${GLITCHTIP_DOMAIN}
|
||||||
|
DEFAULT_FROM_EMAIL: ${GLITCHTIP_DEFAULT_FROM_EMAIL}
|
||||||
|
EMAIL_URL: consolemail://
|
||||||
|
ENABLE_USER_REGISTRATION: "false" # bootstrap admin via manage.py
|
||||||
|
ENABLE_ORGANIZATION_CREATION: "false"
|
||||||
|
CELERY_WORKER_AUTOSCALE: "1,3"
|
||||||
|
CELERY_WORKER_MAX_TASKS_PER_CHILD: "10000"
|
||||||
|
volumes:
|
||||||
|
- glitchtip-uploads:/code/uploads
|
||||||
|
labels:
|
||||||
|
- traefik.enable=true
|
||||||
|
- traefik.docker.network=traefik-public
|
||||||
|
- traefik.http.routers.disclosure-glitchtip.rule=Host(`glitchtip.disclosure.top`)
|
||||||
|
- traefik.http.routers.disclosure-glitchtip.entrypoints=websecure
|
||||||
|
- traefik.http.routers.disclosure-glitchtip.tls=true
|
||||||
|
- traefik.http.routers.disclosure-glitchtip.tls.certresolver=letsencrypt
|
||||||
|
- traefik.http.services.disclosure-glitchtip.loadbalancer.server.port=8080
|
||||||
|
|
||||||
|
glitchtip-worker:
|
||||||
|
container_name: disclosure-glitchtip-worker
|
||||||
|
image: glitchtip/glitchtip:v4.2
|
||||||
|
restart: unless-stopped
|
||||||
|
networks: [internal]
|
||||||
|
depends_on:
|
||||||
|
db: { condition: service_healthy }
|
||||||
|
glitchtip-redis: { condition: service_started }
|
||||||
|
environment:
|
||||||
|
DATABASE_URL: postgres://glitchtip:${GLITCHTIP_DB_PASSWORD}@db:5432/glitchtip
|
||||||
|
SECRET_KEY: ${GLITCHTIP_SECRET_KEY}
|
||||||
|
REDIS_URL: redis://glitchtip-redis:6379/0
|
||||||
|
GLITCHTIP_DOMAIN: ${GLITCHTIP_DOMAIN}
|
||||||
|
DEFAULT_FROM_EMAIL: ${GLITCHTIP_DEFAULT_FROM_EMAIL}
|
||||||
|
EMAIL_URL: consolemail://
|
||||||
|
CELERY_WORKER_AUTOSCALE: "1,3"
|
||||||
|
CELERY_WORKER_MAX_TASKS_PER_CHILD: "10000"
|
||||||
|
volumes:
|
||||||
|
- glitchtip-uploads:/code/uploads
|
||||||
|
command: ./bin/run-celery-with-beat.sh
|
||||||
|
|
||||||
|
# ─── Forgejo — self-hosted Git + Actions CI (W1.2) ────────────────────────
|
||||||
|
forgejo:
|
||||||
|
container_name: disclosure-forgejo
|
||||||
|
image: codeberg.org/forgejo/forgejo:9
|
||||||
|
restart: unless-stopped
|
||||||
|
networks: [internal, traefik]
|
||||||
|
depends_on:
|
||||||
|
db: { condition: service_healthy }
|
||||||
|
environment:
|
||||||
|
USER_UID: "1000"
|
||||||
|
USER_GID: "1000"
|
||||||
|
FORGEJO__database__DB_TYPE: postgres
|
||||||
|
FORGEJO__database__HOST: db:5432
|
||||||
|
FORGEJO__database__NAME: forgejo
|
||||||
|
FORGEJO__database__USER: forgejo
|
||||||
|
FORGEJO__database__PASSWD: ${FORGEJO_DB_PASSWORD}
|
||||||
|
FORGEJO__server__DOMAIN: ${FORGEJO_DOMAIN}
|
||||||
|
FORGEJO__server__ROOT_URL: https://${FORGEJO_DOMAIN}
|
||||||
|
FORGEJO__server__SSH_DOMAIN: ${FORGEJO_DOMAIN}
|
||||||
|
FORGEJO__service__DISABLE_REGISTRATION: "true" # admin invites only
|
||||||
|
FORGEJO__actions__ENABLED: "true"
|
||||||
|
FORGEJO__security__INSTALL_LOCK: "true"
|
||||||
|
volumes:
|
||||||
|
- forgejo-data:/data
|
||||||
|
labels:
|
||||||
|
- traefik.enable=true
|
||||||
|
- traefik.docker.network=traefik-public
|
||||||
|
- traefik.http.routers.disclosure-forgejo.rule=Host(`forgejo.disclosure.top`)
|
||||||
|
- traefik.http.routers.disclosure-forgejo.entrypoints=websecure
|
||||||
|
- traefik.http.routers.disclosure-forgejo.tls=true
|
||||||
|
- traefik.http.routers.disclosure-forgejo.tls.certresolver=letsencrypt
|
||||||
|
- traefik.http.services.disclosure-forgejo.loadbalancer.server.port=3000
|
||||||
|
|
||||||
|
forgejo-runner:
|
||||||
|
container_name: disclosure-forgejo-runner
|
||||||
|
image: code.forgejo.org/forgejo/runner:6
|
||||||
|
restart: unless-stopped
|
||||||
|
networks: [internal]
|
||||||
|
# GID of the docker group on the host — lets the runner (uid 1000) talk
|
||||||
|
# to the docker socket without running as root.
|
||||||
|
group_add:
|
||||||
|
- "988"
|
||||||
|
depends_on:
|
||||||
|
forgejo: { condition: service_started }
|
||||||
|
environment:
|
||||||
|
FORGEJO_INSTANCE_URL: http://forgejo:3000
|
||||||
|
FORGEJO_RUNNER_REGISTRATION_TOKEN: ${FORGEJO_RUNNER_TOKEN}
|
||||||
|
FORGEJO_RUNNER_NAME: disclosure-runner
|
||||||
|
volumes:
|
||||||
|
- forgejo-runner-config:/data
|
||||||
|
- /var/run/docker.sock:/var/run/docker.sock
|
||||||
|
command:
|
||||||
|
- sh
|
||||||
|
- -c
|
||||||
|
- |
|
||||||
|
sleep 10
|
||||||
|
if [ ! -f /data/.runner ]; then
|
||||||
|
forgejo-runner register --no-interactive --instance "$$FORGEJO_INSTANCE_URL" --token "$$FORGEJO_RUNNER_REGISTRATION_TOKEN" --name "$$FORGEJO_RUNNER_NAME" --labels 'ubuntu-latest:docker://node:20-bookworm,docker:host'
|
||||||
|
fi
|
||||||
|
forgejo-runner daemon
|
||||||
|
|
|
||||||
172
infra/supabase/migrations/0003_w0_hardening.sql
Normal file
172
infra/supabase/migrations/0003_w0_hardening.sql
Normal file
|
|
@ -0,0 +1,172 @@
|
||||||
|
-- 0003_w0_hardening.sql
|
||||||
|
--
|
||||||
|
-- W0 hardening migration. Folds two ad-hoc maintenance scripts into the
|
||||||
|
-- canonical migration stream so a clean install on a fresh VPS produces a
|
||||||
|
-- secured, fully-searchable database without any post-bootstrap scripts.
|
||||||
|
--
|
||||||
|
-- F4 — RLS on public.relations (drift vs every other public.* table).
|
||||||
|
-- TD#2 — is_searchable column + reclassification + partial index, AND the
|
||||||
|
-- updated hybrid_search_chunks() that honors it. (Previously lived
|
||||||
|
-- in scripts/maintain/47_mark_unsearchable_chunks.sql + 48_*.sql.)
|
||||||
|
--
|
||||||
|
-- Idempotent. Safe to re-run.
|
||||||
|
|
||||||
|
BEGIN;
|
||||||
|
|
||||||
|
-- IMPORTANT: public.chunks / .entities / .relations are owned by
|
||||||
|
-- `supabase_admin` (not `postgres`). Postgres enforces ownership on RLS DDL
|
||||||
|
-- even for superusers. Run this migration as:
|
||||||
|
--
|
||||||
|
-- docker exec -i disclosure-db psql -U supabase_admin < 0003_w0_hardening.sql
|
||||||
|
--
|
||||||
|
-- The `supabase_admin` role has socket-trust auth on the local container.
|
||||||
|
|
||||||
|
-- ─────────────────────────────────────────────────────────────────────────
|
||||||
|
-- F4 · RLS on public.relations
|
||||||
|
-- ─────────────────────────────────────────────────────────────────────────
|
||||||
|
ALTER TABLE public.relations ENABLE ROW LEVEL SECURITY;
|
||||||
|
|
||||||
|
DROP POLICY IF EXISTS relations_read ON public.relations;
|
||||||
|
CREATE POLICY relations_read ON public.relations FOR SELECT USING (TRUE);
|
||||||
|
|
||||||
|
GRANT SELECT ON public.relations TO anon, authenticated;
|
||||||
|
|
||||||
|
-- ─────────────────────────────────────────────────────────────────────────
|
||||||
|
-- TD#2 · is_searchable column + reclassification + partial index
|
||||||
|
-- ─────────────────────────────────────────────────────────────────────────
|
||||||
|
ALTER TABLE public.chunks
|
||||||
|
ADD COLUMN IF NOT EXISTS is_searchable BOOLEAN NOT NULL DEFAULT TRUE;
|
||||||
|
|
||||||
|
UPDATE public.chunks SET is_searchable = TRUE;
|
||||||
|
|
||||||
|
UPDATE public.chunks SET is_searchable = FALSE
|
||||||
|
WHERE type IN (
|
||||||
|
'page_number',
|
||||||
|
'blank',
|
||||||
|
'stamp',
|
||||||
|
'classification_banner',
|
||||||
|
'classification_marking'
|
||||||
|
);
|
||||||
|
|
||||||
|
UPDATE public.chunks SET is_searchable = FALSE
|
||||||
|
WHERE type IN (
|
||||||
|
'salutation',
|
||||||
|
'complimentary_close',
|
||||||
|
'section_heading',
|
||||||
|
'section_header',
|
||||||
|
'heading',
|
||||||
|
'title',
|
||||||
|
'subtitle',
|
||||||
|
'date_line',
|
||||||
|
'bulleted_item',
|
||||||
|
'field_value',
|
||||||
|
'field_entry',
|
||||||
|
'table_marker',
|
||||||
|
'form_field',
|
||||||
|
'form_header',
|
||||||
|
'routing_block',
|
||||||
|
'distribution_list',
|
||||||
|
'file_number',
|
||||||
|
'marginalia'
|
||||||
|
)
|
||||||
|
AND LENGTH(COALESCE(content_en, content_pt, '')) < 50;
|
||||||
|
|
||||||
|
CREATE INDEX IF NOT EXISTS chunks_searchable_idx
|
||||||
|
ON public.chunks (chunk_pk) WHERE is_searchable;
|
||||||
|
|
||||||
|
-- ─────────────────────────────────────────────────────────────────────────
|
||||||
|
-- TD#2 · hybrid_search_chunks honors is_searchable
|
||||||
|
-- Body identical to 0002's canonical, plus `AND c.is_searchable` in both
|
||||||
|
-- the bm25 and dense CTEs. Replaces the function in place.
|
||||||
|
-- ─────────────────────────────────────────────────────────────────────────
|
||||||
|
DROP FUNCTION IF EXISTS public.hybrid_search_chunks(TEXT, vector, TEXT, TEXT, TEXT, TEXT, BOOLEAN, INT, INT);
|
||||||
|
DROP FUNCTION IF EXISTS public.hybrid_search_chunks(TEXT, vector, TEXT, TEXT, TEXT, TEXT, BOOLEAN, INT, INT, DOUBLE PRECISION);
|
||||||
|
CREATE OR REPLACE FUNCTION public.hybrid_search_chunks(
|
||||||
|
q_text TEXT,
|
||||||
|
q_embedding vector(1024),
|
||||||
|
q_lang TEXT DEFAULT 'pt',
|
||||||
|
q_doc_id TEXT DEFAULT NULL,
|
||||||
|
q_type TEXT DEFAULT NULL,
|
||||||
|
q_classification TEXT DEFAULT NULL,
|
||||||
|
q_ufo_only BOOLEAN DEFAULT FALSE,
|
||||||
|
k INT DEFAULT 100,
|
||||||
|
rrf_k INT DEFAULT 60,
|
||||||
|
max_dense_dist DOUBLE PRECISION DEFAULT 0.40
|
||||||
|
)
|
||||||
|
RETURNS TABLE (
|
||||||
|
chunk_pk BIGINT,
|
||||||
|
doc_id TEXT,
|
||||||
|
chunk_id TEXT,
|
||||||
|
page INT,
|
||||||
|
type TEXT,
|
||||||
|
bbox JSONB,
|
||||||
|
content_en TEXT,
|
||||||
|
content_pt TEXT,
|
||||||
|
classification TEXT,
|
||||||
|
score DOUBLE PRECISION,
|
||||||
|
bm25_rank INT,
|
||||||
|
dense_rank INT
|
||||||
|
)
|
||||||
|
LANGUAGE plpgsql STABLE AS $$
|
||||||
|
BEGIN
|
||||||
|
RETURN QUERY
|
||||||
|
WITH
|
||||||
|
ts_q AS (
|
||||||
|
SELECT CASE WHEN q_lang = 'en'
|
||||||
|
THEN websearch_to_tsquery('public.en_unaccent'::regconfig, q_text)
|
||||||
|
ELSE websearch_to_tsquery('public.pt_unaccent'::regconfig, q_text)
|
||||||
|
END AS q
|
||||||
|
),
|
||||||
|
bm25 AS (
|
||||||
|
SELECT c.chunk_pk,
|
||||||
|
row_number() OVER (ORDER BY
|
||||||
|
ts_rank_cd(
|
||||||
|
CASE WHEN q_lang = 'en' THEN c.ts_en ELSE c.ts_pt END,
|
||||||
|
(SELECT q FROM ts_q)
|
||||||
|
) DESC NULLS LAST
|
||||||
|
)::INT AS r
|
||||||
|
FROM public.chunks c
|
||||||
|
WHERE c.is_searchable
|
||||||
|
AND (CASE WHEN q_lang = 'en' THEN c.ts_en ELSE c.ts_pt END) @@ (SELECT q FROM ts_q)
|
||||||
|
AND (q_doc_id IS NULL OR c.doc_id = q_doc_id)
|
||||||
|
AND (q_type IS NULL OR c.type = q_type)
|
||||||
|
AND (q_classification IS NULL OR c.classification = q_classification)
|
||||||
|
AND (NOT q_ufo_only OR c.ufo_anomaly = TRUE)
|
||||||
|
LIMIT k
|
||||||
|
),
|
||||||
|
dense AS (
|
||||||
|
SELECT c.chunk_pk,
|
||||||
|
row_number() OVER (ORDER BY c.embedding <=> q_embedding)::INT AS r
|
||||||
|
FROM public.chunks c
|
||||||
|
WHERE c.is_searchable
|
||||||
|
AND c.embedding IS NOT NULL
|
||||||
|
AND (c.embedding <=> q_embedding) < max_dense_dist
|
||||||
|
AND (q_doc_id IS NULL OR c.doc_id = q_doc_id)
|
||||||
|
AND (q_type IS NULL OR c.type = q_type)
|
||||||
|
AND (q_classification IS NULL OR c.classification = q_classification)
|
||||||
|
AND (NOT q_ufo_only OR c.ufo_anomaly = TRUE)
|
||||||
|
ORDER BY c.embedding <=> q_embedding
|
||||||
|
LIMIT k
|
||||||
|
),
|
||||||
|
fused AS (
|
||||||
|
SELECT COALESCE(b.chunk_pk, d.chunk_pk) AS chunk_pk,
|
||||||
|
((1.0::DOUBLE PRECISION / (rrf_k + COALESCE(b.r, k + 1))::DOUBLE PRECISION) +
|
||||||
|
(1.0::DOUBLE PRECISION / (rrf_k + COALESCE(d.r, k + 1))::DOUBLE PRECISION)) AS score,
|
||||||
|
b.r AS bm25_rank,
|
||||||
|
d.r AS dense_rank
|
||||||
|
FROM bm25 b
|
||||||
|
FULL OUTER JOIN dense d USING (chunk_pk)
|
||||||
|
)
|
||||||
|
SELECT c.chunk_pk, c.doc_id, c.chunk_id, c.page, c.type, c.bbox,
|
||||||
|
c.content_en, c.content_pt, c.classification,
|
||||||
|
f.score, f.bm25_rank, f.dense_rank
|
||||||
|
FROM fused f
|
||||||
|
JOIN public.chunks c USING (chunk_pk)
|
||||||
|
ORDER BY f.score DESC
|
||||||
|
LIMIT k;
|
||||||
|
END
|
||||||
|
$$;
|
||||||
|
|
||||||
|
GRANT EXECUTE ON FUNCTION public.hybrid_search_chunks TO anon, authenticated;
|
||||||
|
|
||||||
|
COMMIT;
|
||||||
|
|
@ -90,10 +90,12 @@ def jaccard(a: set, b: set) -> float:
|
||||||
|
|
||||||
def primary_id(s: str) -> str | None:
|
def primary_id(s: str) -> str | None:
|
||||||
n = normalize(s)
|
n = normalize(s)
|
||||||
|
# Catch (agency)-uap-d(\d+) once and rest of the dedicated patterns. Match
|
||||||
|
# "cia-uap-d001", "doe-uap-d002", "odni-uap-d001", "dow-uap-d017", etc.
|
||||||
|
m = re.match(r"^((?:cia|doe|dod|dow|dos|odni|nasa|fbi)-uap-[a-z]{1,4}\d+[a-z]?)", n)
|
||||||
|
if m:
|
||||||
|
return m.group(1)
|
||||||
for p in (
|
for p in (
|
||||||
r"^(dow-uap-[a-z]{1,4}\d+)",
|
|
||||||
r"^(dos-uap-d\d+)",
|
|
||||||
r"^(nasa-uap-[a-z]{1,3}\d+[a-z]?)",
|
|
||||||
r"^(fbi-photo-[a-z]\d+)",
|
r"^(fbi-photo-[a-z]\d+)",
|
||||||
):
|
):
|
||||||
m = re.match(p, n)
|
m = re.match(p, n)
|
||||||
|
|
@ -216,14 +218,33 @@ def main():
|
||||||
ap = argparse.ArgumentParser()
|
ap = argparse.ArgumentParser()
|
||||||
ap.add_argument("--dry-run", action="store_true")
|
ap.add_argument("--dry-run", action="store_true")
|
||||||
ap.add_argument("--rename-events", action="store_true", help="Rename EV-XXXX events to EV-YYYY-MM-DD")
|
ap.add_argument("--rename-events", action="store_true", help="Rename EV-XXXX events to EV-YYYY-MM-DD")
|
||||||
|
ap.add_argument("--metadata-json", action="append", default=None,
|
||||||
|
help="Path to a war.gov metadata JSON. Pass multiple times to merge releases. "
|
||||||
|
"Defaults to release-01 + release-02 if present.")
|
||||||
args = ap.parse_args()
|
args = ap.parse_args()
|
||||||
|
|
||||||
if not METADATA_JSON.exists():
|
if args.metadata_json:
|
||||||
sys.stderr.write(f"Metadata JSON not found: {METADATA_JSON}\n")
|
json_paths = [Path(p) for p in args.metadata_json]
|
||||||
sys.exit(1)
|
else:
|
||||||
data = json.loads(METADATA_JSON.read_text(encoding="utf-8"))
|
# Default: load every release-NN-basic JSON found, so 116 existing docs
|
||||||
records = data.get("documents", [])
|
# (release-01) and 6 new docs (release-02) all get enriched in one pass.
|
||||||
print(f"war.gov records: {len(records)}")
|
json_paths = sorted((UFO_ROOT / "processing" / "war-gov-metadata").glob("all-documents-release-*-basic.json"))
|
||||||
|
if not json_paths:
|
||||||
|
json_paths = [METADATA_JSON]
|
||||||
|
|
||||||
|
records: list[dict] = []
|
||||||
|
for p in json_paths:
|
||||||
|
if not p.exists():
|
||||||
|
sys.stderr.write(f"Metadata JSON not found: {p}\n"); sys.exit(1)
|
||||||
|
d = json.loads(p.read_text(encoding="utf-8"))
|
||||||
|
recs = d.get("documents", [])
|
||||||
|
extracted_at = d.get("extracted_at")
|
||||||
|
for r in recs:
|
||||||
|
r.setdefault("_extracted_at", extracted_at)
|
||||||
|
r.setdefault("_source_json", p.name)
|
||||||
|
print(f"war.gov records from {p.name}: {len(recs)}")
|
||||||
|
records.extend(recs)
|
||||||
|
print(f"war.gov records total: {len(records)}")
|
||||||
|
|
||||||
war_index = build_war_index(records)
|
war_index = build_war_index(records)
|
||||||
docs = sorted(DOCS_DIR.glob("*.md"))
|
docs = sorted(DOCS_DIR.glob("*.md"))
|
||||||
|
|
@ -268,7 +289,7 @@ def main():
|
||||||
"document_type_official": match.get("document_type"),
|
"document_type_official": match.get("document_type"),
|
||||||
"match_reason": reason,
|
"match_reason": reason,
|
||||||
"availability": "pending-upstream" if match["record_id"] in PLACEHOLDER_RECORDS else "downloaded",
|
"availability": "pending-upstream" if match["record_id"] in PLACEHOLDER_RECORDS else "downloaded",
|
||||||
"extracted_from_war_gov_at": data.get("extracted_at"),
|
"extracted_from_war_gov_at": match.get("_extracted_at"),
|
||||||
}
|
}
|
||||||
|
|
||||||
new_fm = dict(fm)
|
new_fm = dict(fm)
|
||||||
|
|
@ -352,7 +373,7 @@ def main():
|
||||||
fh.write(
|
fh.write(
|
||||||
f"\n## {datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%SZ')} — ENRICH WAR.GOV (Phase 0.5)\n"
|
f"\n## {datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%SZ')} — ENRICH WAR.GOV (Phase 0.5)\n"
|
||||||
f"- operator: archivist\n- script: scripts/02b-enrich-with-web-metadata.py\n"
|
f"- operator: archivist\n- script: scripts/02b-enrich-with-web-metadata.py\n"
|
||||||
f"- json_source: {METADATA_JSON.name}\n"
|
f"- json_source: {', '.join(p.name for p in json_paths)}\n"
|
||||||
f"- enriched: {enriched}\n- unchanged: {unchanged}\n- unmatched: {len(unmatched)}\n"
|
f"- enriched: {enriched}\n- unchanged: {unchanged}\n- unmatched: {len(unmatched)}\n"
|
||||||
f"- event_renames: {rename_count}\n"
|
f"- event_renames: {rename_count}\n"
|
||||||
)
|
)
|
||||||
|
|
|
||||||
|
|
@ -264,9 +264,26 @@ def main() -> int:
|
||||||
SELECT source_class, source_id, relation_type,
|
SELECT source_class, source_id, relation_type,
|
||||||
target_class, target_id, evidence_ref,
|
target_class, target_id, evidence_ref,
|
||||||
confidence, extracted_by
|
confidence, extracted_by
|
||||||
FROM _rel ON CONFLICT DO NOTHING"""
|
FROM _rel
|
||||||
|
WHERE relation_type IN ('witnessed','occurred_at','involves_uap',
|
||||||
|
'documented_in','authored','signed',
|
||||||
|
'mentioned_by','employed_by','operated_by',
|
||||||
|
'investigated','commanded','related_to',
|
||||||
|
'similar_to','precedes','follows')
|
||||||
|
ON CONFLICT DO NOTHING"""
|
||||||
)
|
)
|
||||||
print(f"Inserted (after ON CONFLICT): {cur.rowcount}")
|
print(f"Inserted (after ON CONFLICT + type filter): {cur.rowcount}")
|
||||||
|
cur.execute(
|
||||||
|
"SELECT relation_type, COUNT(*) FROM _rel WHERE relation_type NOT IN "
|
||||||
|
"('witnessed','occurred_at','involves_uap','documented_in','authored','signed',"
|
||||||
|
"'mentioned_by','employed_by','operated_by','investigated','commanded',"
|
||||||
|
"'related_to','similar_to','precedes','follows') GROUP BY relation_type ORDER BY 2 DESC"
|
||||||
|
)
|
||||||
|
drops = cur.fetchall()
|
||||||
|
if drops:
|
||||||
|
print("Dropped (invalid relation_type):")
|
||||||
|
for t, n in drops:
|
||||||
|
print(f" {n:>5} {t}")
|
||||||
cur.execute(
|
cur.execute(
|
||||||
"SELECT relation_type, COUNT(*) FROM public.relations GROUP BY relation_type ORDER BY 2 DESC"
|
"SELECT relation_type, COUNT(*) FROM public.relations GROUP BY relation_type ORDER BY 2 DESC"
|
||||||
)
|
)
|
||||||
|
|
|
||||||
|
|
@ -30,7 +30,9 @@ EMBED_URL = os.getenv("EMBED_SERVICE_URL", "http://localhost:8000")
|
||||||
|
|
||||||
|
|
||||||
def embed_batch(texts: list[str]) -> list[list[float]]:
|
def embed_batch(texts: list[str]) -> list[list[float]]:
|
||||||
resp = requests.post(f"{EMBED_URL}/embed", json={"texts": texts}, timeout=120)
|
# Cold-start of BGE-M3 takes ~8s per text on CPU; first call can run ~minutes
|
||||||
|
# for a batch. Bump timeout to 10 minutes so the first batch doesn't kill the run.
|
||||||
|
resp = requests.post(f"{EMBED_URL}/embed", json={"texts": texts}, timeout=600)
|
||||||
resp.raise_for_status()
|
resp.raise_for_status()
|
||||||
return resp.json()["embeddings"]
|
return resp.json()["embeddings"]
|
||||||
|
|
||||||
|
|
|
||||||
151
scripts/maintain/60_meili_index.py
Normal file
151
scripts/maintain/60_meili_index.py
Normal file
|
|
@ -0,0 +1,151 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
60_meili_index.py — Push documents + chunks into Meilisearch for autocomplete.
|
||||||
|
|
||||||
|
W1 deliverable. Meilisearch is the typo-tolerant prefix-aware search engine in
|
||||||
|
the stack; it complements Postgres BM25 + pgvector (used by the chat). The
|
||||||
|
goal here is fast `/search` autocomplete that shows matching docs and chunks
|
||||||
|
as the user types — sub-30ms.
|
||||||
|
|
||||||
|
Indexes created:
|
||||||
|
- documents id=doc_id, fields=[canonical_title, collection, doc_id]
|
||||||
|
- chunks id=chunk_pk, fields=[doc_id, chunk_id, page, content_en, content_pt]
|
||||||
|
|
||||||
|
Idempotent: re-running upserts. Skip `--reset` to rebuild from scratch.
|
||||||
|
|
||||||
|
Run from inside the disclosure-internal network OR with --meili-url override.
|
||||||
|
The default reads MEILI_MASTER_KEY + MEILISEARCH_URL from env.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 scripts/maintain/60_meili_index.py
|
||||||
|
python3 scripts/maintain/60_meili_index.py --reset
|
||||||
|
python3 scripts/maintain/60_meili_index.py --doc-id <id>
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
try:
|
||||||
|
import psycopg
|
||||||
|
import requests
|
||||||
|
except ImportError as e:
|
||||||
|
sys.exit(f"pip install psycopg[binary] requests # missing: {e}")
|
||||||
|
|
||||||
|
DATABASE_URL = os.getenv("DATABASE_URL") or os.getenv("SUPABASE_DB_URL")
|
||||||
|
MEILI_URL = os.getenv("MEILISEARCH_URL", "http://meilisearch:7700")
|
||||||
|
MEILI_KEY = os.getenv("MEILI_MASTER_KEY") or os.getenv("MEILISEARCH_API_KEY", "")
|
||||||
|
BATCH = int(os.getenv("MEILI_BATCH", "1000"))
|
||||||
|
|
||||||
|
|
||||||
|
def meili(method: str, path: str, body: Any = None) -> dict:
|
||||||
|
headers = {"Authorization": f"Bearer {MEILI_KEY}", "Content-Type": "application/json"}
|
||||||
|
r = requests.request(method, f"{MEILI_URL}{path}", headers=headers,
|
||||||
|
data=json.dumps(body) if body is not None else None,
|
||||||
|
timeout=120)
|
||||||
|
r.raise_for_status()
|
||||||
|
return r.json() if r.text else {}
|
||||||
|
|
||||||
|
|
||||||
|
def ensure_index(uid: str, primary_key: str, searchable: list[str], filterable: list[str]):
|
||||||
|
"""Create the index if missing, then set settings."""
|
||||||
|
try:
|
||||||
|
meili("POST", "/indexes", {"uid": uid, "primaryKey": primary_key})
|
||||||
|
print(f" created index {uid}")
|
||||||
|
except requests.HTTPError as e:
|
||||||
|
# 409 = already exists, OK.
|
||||||
|
if e.response.status_code not in (400, 409):
|
||||||
|
raise
|
||||||
|
meili("PATCH", f"/indexes/{uid}/settings", {
|
||||||
|
"searchableAttributes": searchable,
|
||||||
|
"filterableAttributes": filterable,
|
||||||
|
"displayedAttributes": ["*"],
|
||||||
|
"rankingRules": ["words", "typo", "proximity", "attribute", "sort", "exactness"],
|
||||||
|
"typoTolerance": {"enabled": True, "minWordSizeForTypos": {"oneTypo": 4, "twoTypos": 8}},
|
||||||
|
})
|
||||||
|
|
||||||
|
|
||||||
|
def push(uid: str, docs: list[dict]):
|
||||||
|
if not docs: return
|
||||||
|
meili("POST", f"/indexes/{uid}/documents", docs)
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> int:
|
||||||
|
ap = argparse.ArgumentParser()
|
||||||
|
ap.add_argument("--reset", action="store_true", help="Delete and recreate indexes")
|
||||||
|
ap.add_argument("--doc-id", help="Reindex only one doc")
|
||||||
|
args = ap.parse_args()
|
||||||
|
|
||||||
|
if not DATABASE_URL: sys.exit("DATABASE_URL not set")
|
||||||
|
if not MEILI_KEY: sys.exit("MEILI_MASTER_KEY not set")
|
||||||
|
|
||||||
|
if args.reset and not args.doc_id:
|
||||||
|
print("Resetting indexes...")
|
||||||
|
for uid in ("documents", "chunks"):
|
||||||
|
try: meili("DELETE", f"/indexes/{uid}")
|
||||||
|
except requests.HTTPError: pass
|
||||||
|
|
||||||
|
ensure_index("documents", "doc_id",
|
||||||
|
searchable=["canonical_title", "collection", "doc_id"],
|
||||||
|
filterable=["collection", "classification"])
|
||||||
|
ensure_index("chunks", "chunk_pk",
|
||||||
|
searchable=["content_pt", "content_en", "doc_id", "chunk_id"],
|
||||||
|
filterable=["doc_id", "type", "classification", "ufo_anomaly", "is_searchable"])
|
||||||
|
|
||||||
|
with psycopg.connect(DATABASE_URL) as conn, conn.cursor() as cur:
|
||||||
|
# documents
|
||||||
|
where_doc = "WHERE doc_id = %s" if args.doc_id else ""
|
||||||
|
params = (args.doc_id,) if args.doc_id else ()
|
||||||
|
cur.execute(f"""
|
||||||
|
SELECT doc_id, canonical_title, collection, classification
|
||||||
|
FROM public.documents {where_doc}
|
||||||
|
""", params)
|
||||||
|
rows = cur.fetchall()
|
||||||
|
docs = [{"doc_id": r[0], "canonical_title": r[1] or r[0],
|
||||||
|
"collection": r[2] or "", "classification": r[3] or ""} for r in rows]
|
||||||
|
print(f"documents → meili: {len(docs)}")
|
||||||
|
for i in range(0, len(docs), BATCH):
|
||||||
|
push("documents", docs[i:i+BATCH])
|
||||||
|
|
||||||
|
# chunks (only searchable ones — drops scaffolding noise)
|
||||||
|
where_chunk = "WHERE c.is_searchable" + (" AND c.doc_id = %s" if args.doc_id else "")
|
||||||
|
cur.execute(f"""
|
||||||
|
SELECT c.chunk_pk, c.doc_id, c.chunk_id, c.page, c.type,
|
||||||
|
c.content_en, c.content_pt, c.classification, c.ufo_anomaly
|
||||||
|
FROM public.chunks c
|
||||||
|
{where_chunk}
|
||||||
|
""", params)
|
||||||
|
chunks: list[dict] = []
|
||||||
|
total = 0
|
||||||
|
for r in cur:
|
||||||
|
chunks.append({
|
||||||
|
"chunk_pk": r[0],
|
||||||
|
"doc_id": r[1],
|
||||||
|
"chunk_id": r[2],
|
||||||
|
"page": r[3],
|
||||||
|
"type": r[4],
|
||||||
|
"content_en": (r[5] or "")[:2000],
|
||||||
|
"content_pt": (r[6] or "")[:2000],
|
||||||
|
"classification": r[7] or "",
|
||||||
|
"ufo_anomaly": bool(r[8]),
|
||||||
|
"is_searchable": True,
|
||||||
|
})
|
||||||
|
if len(chunks) >= BATCH:
|
||||||
|
push("chunks", chunks)
|
||||||
|
total += len(chunks)
|
||||||
|
chunks = []
|
||||||
|
print(f" pushed {total} chunks...")
|
||||||
|
if chunks:
|
||||||
|
push("chunks", chunks)
|
||||||
|
total += len(chunks)
|
||||||
|
print(f"chunks → meili: {total}")
|
||||||
|
|
||||||
|
print("\n✓ done. Indexer enqueued; meili processes asynchronously.")
|
||||||
|
print(f" Verify: curl -H 'Authorization: Bearer ...' {MEILI_URL}/indexes/chunks/stats")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
sys.exit(main())
|
||||||
|
|
@ -97,7 +97,7 @@ def call_llm(prompt: str) -> str:
|
||||||
["claude", "-p", "--model", "sonnet", "--output-format", "text",
|
["claude", "-p", "--model", "sonnet", "--output-format", "text",
|
||||||
"--disallowed-tools", DISALLOWED],
|
"--disallowed-tools", DISALLOWED],
|
||||||
input=prompt.encode("utf-8"), stdout=out, stderr=subprocess.PIPE, env=env,
|
input=prompt.encode("utf-8"), stdout=out, stderr=subprocess.PIPE, env=env,
|
||||||
timeout=600,
|
timeout=1200,
|
||||||
)
|
)
|
||||||
if r.returncode != 0:
|
if r.returncode != 0:
|
||||||
sys.exit(f"claude failed rc={r.returncode}: {r.stderr.decode('utf-8','replace')[:500]}")
|
sys.exit(f"claude failed rc={r.returncode}: {r.stderr.decode('utf-8','replace')[:500]}")
|
||||||
|
|
@ -107,6 +107,62 @@ def call_llm(prompt: str) -> str:
|
||||||
except OSError: pass
|
except OSError: pass
|
||||||
|
|
||||||
|
|
||||||
|
# Above this size, the reading version won't fit one Sonnet call (32k-token
|
||||||
|
# output ceiling + timeout), so we segment by page blocks and concatenate.
|
||||||
|
SEGMENT_THRESHOLD = 90_000
|
||||||
|
SEGMENT_CHARS = 45_000
|
||||||
|
|
||||||
|
PROMPT_SEGMENT = """You are a meticulous archivist-typographer for The Disclosure Bureau. This is
|
||||||
|
PART {n} OF {m} of a large scanned UAP/UFO document — you receive the raw
|
||||||
|
machine-extracted text of THIS part only (chunk by chunk). The scan is messy:
|
||||||
|
duplicate transcriptions, OCR noise, repeated letterheads, classification
|
||||||
|
banners, page numbers, routing stamps.
|
||||||
|
|
||||||
|
Produce a clean, faithful, well-structured reading version of THIS PART in
|
||||||
|
Markdown.
|
||||||
|
|
||||||
|
RULES:
|
||||||
|
1. FAITHFUL — never invent. Keep [redacted]/[ilegível] markers.
|
||||||
|
2. DEDUPLICATE within this part — merge repeated content, keep unique details.
|
||||||
|
3. DROP page furniture (letterheads, banners, page numbers, routing stamps, OCR
|
||||||
|
garbage).
|
||||||
|
4. STRUCTURE with clear Markdown headings (##/###) and clean dialogue
|
||||||
|
(**SPEAKER:**) for transcripts. Do NOT write a document-level H1 title (the
|
||||||
|
document already has one); start at "## Part {n}" then sub-sections.
|
||||||
|
5. BILINGUAL — for THIS part output English first under "### English", then
|
||||||
|
Brazilian Portuguese under "### Português". Natural pt-br with correct accents.
|
||||||
|
6. PRESERVE every investigative detail (sightings, coords, times, witnesses,
|
||||||
|
object descriptions, quotes).
|
||||||
|
|
||||||
|
Return ONLY the Markdown for this part (no code fence, no preamble). Start with
|
||||||
|
"## Part {n}".
|
||||||
|
|
||||||
|
DOCUMENT (doc_id: {doc_id}) — PART {n} OF {m}, raw chunks follow:
|
||||||
|
|
||||||
|
{doc_text}
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
def segment_text(text: str) -> list[str]:
|
||||||
|
"""Split doc text into blocks at [chunk ...] markers near SEGMENT_CHARS."""
|
||||||
|
import re as _re
|
||||||
|
if len(text) <= SEGMENT_CHARS:
|
||||||
|
return [text]
|
||||||
|
starts = [m.start() for m in _re.finditer(r"^\[chunk c\d+", text, _re.MULTILINE)]
|
||||||
|
if not starts:
|
||||||
|
return [text]
|
||||||
|
segs: list[str] = []
|
||||||
|
s = 0
|
||||||
|
while s < len(text):
|
||||||
|
cap = s + SEGMENT_CHARS
|
||||||
|
if cap >= len(text):
|
||||||
|
segs.append(text[s:]); break
|
||||||
|
cands = [p for p in starts if s < p < cap]
|
||||||
|
e = cands[-1] if cands else cap
|
||||||
|
segs.append(text[s:e]); s = e
|
||||||
|
return segs
|
||||||
|
|
||||||
|
|
||||||
def main() -> int:
|
def main() -> int:
|
||||||
if len(sys.argv) < 2:
|
if len(sys.argv) < 2:
|
||||||
sys.exit("usage: 40_reading_version.py <doc-id>")
|
sys.exit("usage: 40_reading_version.py <doc-id>")
|
||||||
|
|
@ -118,9 +174,21 @@ def main() -> int:
|
||||||
print(f" {len(doc_text)} chars (~{len(doc_text)//4} tokens)")
|
print(f" {len(doc_text)} chars (~{len(doc_text)//4} tokens)")
|
||||||
|
|
||||||
print("[2/3] generating reading version (Sonnet) ...")
|
print("[2/3] generating reading version (Sonnet) ...")
|
||||||
md = call_llm(PROMPT.format(doc_id=doc_id, doc_text=doc_text)).strip()
|
if len(doc_text) > SEGMENT_THRESHOLD:
|
||||||
if md.startswith("```"):
|
segs = segment_text(doc_text)
|
||||||
md = "\n".join(l for l in md.splitlines() if not l.startswith("```")).strip()
|
print(f" large doc → {len(segs)} segments")
|
||||||
|
parts: list[str] = []
|
||||||
|
for i, seg in enumerate(segs, 1):
|
||||||
|
print(f" segment {i}/{len(segs)} ({len(seg)} chars) ...")
|
||||||
|
p = call_llm(PROMPT_SEGMENT.format(n=i, m=len(segs), doc_id=doc_id, doc_text=seg)).strip()
|
||||||
|
if p.startswith("```"):
|
||||||
|
p = "\n".join(l for l in p.splitlines() if not l.startswith("```")).strip()
|
||||||
|
parts.append(p)
|
||||||
|
md = "\n\n---\n\n".join(parts)
|
||||||
|
else:
|
||||||
|
md = call_llm(PROMPT.format(doc_id=doc_id, doc_text=doc_text)).strip()
|
||||||
|
if md.startswith("```"):
|
||||||
|
md = "\n".join(l for l in md.splitlines() if not l.startswith("```")).strip()
|
||||||
|
|
||||||
front = (
|
front = (
|
||||||
f"---\nschema_version: \"0.1.0\"\ntype: reading\ndoc_id: {doc_id}\n"
|
f"---\nschema_version: \"0.1.0\"\ntype: reading\ndoc_id: {doc_id}\n"
|
||||||
|
|
|
||||||
69
scripts/synthesize/run_reading_parallel.sh
Executable file
69
scripts/synthesize/run_reading_parallel.sh
Executable file
|
|
@ -0,0 +1,69 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
# Generate the clean LLM reading version for every document, in parallel.
|
||||||
|
#
|
||||||
|
# - One doc per `claude -p` (Sonnet) via 40_reading_version.py
|
||||||
|
# - Skips docs that already have reading.md (idempotent — safe to re-run)
|
||||||
|
# - mkdir-based per-doc lock prevents two workers racing the same doc
|
||||||
|
# - WORKERS parallel workers (default 2)
|
||||||
|
#
|
||||||
|
# Run:
|
||||||
|
# ./run_reading_parallel.sh # all docs, 2 workers
|
||||||
|
# WORKERS=3 ./run_reading_parallel.sh # 3 workers
|
||||||
|
# ./run_reading_parallel.sh DOC1 DOC2 # specific docs only
|
||||||
|
set -uo pipefail
|
||||||
|
|
||||||
|
UFO="/Users/guto/ufo"
|
||||||
|
RAW="$UFO/raw"
|
||||||
|
GEN="$UFO/scripts/synthesize/40_reading_version.py"
|
||||||
|
WORKERS="${WORKERS:-2}"
|
||||||
|
|
||||||
|
if [ "$#" -gt 0 ]; then
|
||||||
|
DOCS=("$@")
|
||||||
|
else
|
||||||
|
DOCS=()
|
||||||
|
for d in "$RAW"/*--subagent; do
|
||||||
|
[ -f "$d/_index.json" ] || continue
|
||||||
|
DOCS+=("$(basename "$d" | sed 's/--subagent$//')")
|
||||||
|
done
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "=== reading-version generator ==="
|
||||||
|
echo " docs queued: ${#DOCS[@]}"
|
||||||
|
echo " workers: $WORKERS"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
process_one() {
|
||||||
|
local doc_id="$1"
|
||||||
|
local sub="$RAW/$doc_id--subagent"
|
||||||
|
local out="$sub/reading.md"
|
||||||
|
local log="$sub/_reading.log"
|
||||||
|
local lock="$sub/.reading.lock"
|
||||||
|
|
||||||
|
if [ -f "$out" ]; then
|
||||||
|
echo "[SKIP] $doc_id (already has reading.md)"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
if ! mkdir "$lock" 2>/dev/null; then
|
||||||
|
echo "[LOCK] $doc_id (another worker)"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
trap "rmdir '$lock' 2>/dev/null || true" EXIT
|
||||||
|
|
||||||
|
local t0=$(date +%s)
|
||||||
|
echo "[BEGIN] $doc_id"
|
||||||
|
if python3 "$GEN" "$doc_id" > "$log" 2>&1; then
|
||||||
|
echo "[OK] $doc_id ($(($(date +%s) - t0))s)"
|
||||||
|
else
|
||||||
|
echo "[FAIL] $doc_id ($(($(date +%s) - t0))s) — see $log"
|
||||||
|
fi
|
||||||
|
rmdir "$lock" 2>/dev/null || true
|
||||||
|
trap - EXIT
|
||||||
|
}
|
||||||
|
export -f process_one
|
||||||
|
export RAW GEN
|
||||||
|
|
||||||
|
printf '%s\n' "${DOCS[@]}" | xargs -n 1 -P "$WORKERS" -I {} bash -c 'process_one "$@"' _ {}
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "=== Done. reading.md count: ==="
|
||||||
|
ls "$RAW"/*--subagent/reading.md 2>/dev/null | wc -l
|
||||||
16
web/app/api/admin/throw/route.ts
Normal file
16
web/app/api/admin/throw/route.ts
Normal file
|
|
@ -0,0 +1,16 @@
|
||||||
|
/**
|
||||||
|
* /api/debug/throw — admin-only error injector. Throws on demand so we can
|
||||||
|
* verify Glitchtip is receiving events. Gated by /api/admin/* middleware (404
|
||||||
|
* for non-admins).
|
||||||
|
*
|
||||||
|
* Move the path under /api/admin/* so the W0-F1 middleware gate applies.
|
||||||
|
*/
|
||||||
|
import { withRequest } from "@/lib/logger";
|
||||||
|
|
||||||
|
export const runtime = "nodejs";
|
||||||
|
|
||||||
|
export async function GET(request: Request) {
|
||||||
|
const log = withRequest(request);
|
||||||
|
log.warn({ event: "debug_throw" }, "intentional error for Glitchtip smoke test");
|
||||||
|
throw new Error("debug_throw_smoke_test: glitchtip wiring verified at " + new Date().toISOString());
|
||||||
|
}
|
||||||
95
web/app/api/search/autocomplete/route.ts
Normal file
95
web/app/api/search/autocomplete/route.ts
Normal file
|
|
@ -0,0 +1,95 @@
|
||||||
|
/**
|
||||||
|
* /api/search/autocomplete — typo-tolerant prefix search via Meilisearch.
|
||||||
|
*
|
||||||
|
* Hits two indexes in parallel and returns a small merged result:
|
||||||
|
* - documents (title-level matches, used to jump to a doc)
|
||||||
|
* - chunks (passage-level matches, used for in-doc navigation)
|
||||||
|
*
|
||||||
|
* Target latency: sub-30ms inside the docker network. Falls back to empty
|
||||||
|
* results if Meilisearch is unreachable so the chat / hybrid_search aren't
|
||||||
|
* blocked. Auth: none — same as /api/search/hybrid; corpus is public.
|
||||||
|
*/
|
||||||
|
import { NextResponse } from "next/server";
|
||||||
|
import { withRequest } from "@/lib/logger";
|
||||||
|
|
||||||
|
export const runtime = "nodejs";
|
||||||
|
export const dynamic = "force-dynamic";
|
||||||
|
|
||||||
|
const MEILI_URL = process.env.MEILISEARCH_URL || "http://meilisearch:7700";
|
||||||
|
const MEILI_KEY = process.env.MEILISEARCH_API_KEY || process.env.MEILI_MASTER_KEY || "";
|
||||||
|
|
||||||
|
interface DocHit {
|
||||||
|
doc_id: string;
|
||||||
|
canonical_title: string;
|
||||||
|
collection?: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface ChunkHit {
|
||||||
|
chunk_pk: number;
|
||||||
|
doc_id: string;
|
||||||
|
chunk_id: string;
|
||||||
|
page: number;
|
||||||
|
type: string;
|
||||||
|
content_pt?: string;
|
||||||
|
content_en?: string;
|
||||||
|
ufo_anomaly?: boolean;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function meiliSearch(index: string, q: string, limit: number): Promise<unknown[]> {
|
||||||
|
const r = await fetch(`${MEILI_URL}/indexes/${index}/search`, {
|
||||||
|
method: "POST",
|
||||||
|
headers: {
|
||||||
|
"Authorization": `Bearer ${MEILI_KEY}`,
|
||||||
|
"Content-Type": "application/json",
|
||||||
|
},
|
||||||
|
body: JSON.stringify({ q, limit, attributesToHighlight: ["canonical_title", "content_pt", "content_en"] }),
|
||||||
|
signal: AbortSignal.timeout(2000),
|
||||||
|
});
|
||||||
|
if (!r.ok) throw new Error(`meili ${r.status}`);
|
||||||
|
const data = await r.json();
|
||||||
|
return data.hits ?? [];
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function GET(request: Request) {
|
||||||
|
const log = withRequest(request);
|
||||||
|
const url = new URL(request.url);
|
||||||
|
const q = (url.searchParams.get("q") || "").trim();
|
||||||
|
const limit = Math.min(Number(url.searchParams.get("limit") || 8), 20);
|
||||||
|
|
||||||
|
if (q.length < 2) {
|
||||||
|
return NextResponse.json({ q, documents: [], chunks: [] });
|
||||||
|
}
|
||||||
|
if (!MEILI_KEY) {
|
||||||
|
log.warn({ event: "autocomplete_unconfigured" }, "MEILI key not set");
|
||||||
|
return NextResponse.json({ q, documents: [], chunks: [], reason: "meili_not_configured" });
|
||||||
|
}
|
||||||
|
|
||||||
|
const t0 = Date.now();
|
||||||
|
const [docs, chunks] = await Promise.all([
|
||||||
|
meiliSearch("documents", q, Math.min(limit, 5)).catch(() => []),
|
||||||
|
meiliSearch("chunks", q, limit).catch(() => []),
|
||||||
|
]) as [DocHit[], ChunkHit[]];
|
||||||
|
|
||||||
|
const dt = Date.now() - t0;
|
||||||
|
log.info({ event: "autocomplete", q, docs: docs.length, chunks: chunks.length, dt_ms: dt }, "autocomplete done");
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
q,
|
||||||
|
duration_ms: dt,
|
||||||
|
documents: docs.map((d) => ({
|
||||||
|
doc_id: d.doc_id,
|
||||||
|
title: d.canonical_title,
|
||||||
|
collection: d.collection,
|
||||||
|
href: `/d/${d.doc_id}`,
|
||||||
|
})),
|
||||||
|
chunks: chunks.map((c) => ({
|
||||||
|
chunk_id: c.chunk_id,
|
||||||
|
doc_id: c.doc_id,
|
||||||
|
page: c.page,
|
||||||
|
type: c.type,
|
||||||
|
excerpt: (c.content_pt || c.content_en || "").slice(0, 180),
|
||||||
|
ufo_anomaly: !!c.ufo_anomaly,
|
||||||
|
href: `/d/${c.doc_id}/p${String(c.page).padStart(3, "0")}#${c.chunk_id}`,
|
||||||
|
})),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
@ -18,6 +18,7 @@ import { createClient, isSupabaseConfigured } from "@/lib/supabase/server";
|
||||||
import { readDocument, readPage } from "@/lib/wiki";
|
import { readDocument, readPage } from "@/lib/wiki";
|
||||||
import { streamChat } from "@/lib/chat";
|
import { streamChat } from "@/lib/chat";
|
||||||
import { getLocale } from "@/components/locale-toggle";
|
import { getLocale } from "@/components/locale-toggle";
|
||||||
|
import { withRequest } from "@/lib/logger";
|
||||||
|
|
||||||
async function gatherContext(docId: string | null, pageId: string | null): Promise<string> {
|
async function gatherContext(docId: string | null, pageId: string | null): Promise<string> {
|
||||||
const parts: string[] = [];
|
const parts: string[] = [];
|
||||||
|
|
@ -129,8 +130,9 @@ Quotes verbatim do documento mantêm idioma original (inglês), narração ao re
|
||||||
export async function POST(request: Request, ctx: { params: Promise<{ id: string }> }) {
|
export async function POST(request: Request, ctx: { params: Promise<{ id: string }> }) {
|
||||||
const { id: sessionId } = await ctx.params;
|
const { id: sessionId } = await ctx.params;
|
||||||
const t0 = Date.now();
|
const t0 = Date.now();
|
||||||
|
const baseLog = withRequest(request).child({ session_id: sessionId.slice(0, 8) });
|
||||||
const log = (stage: string, extra: Record<string, unknown> = {}) =>
|
const log = (stage: string, extra: Record<string, unknown> = {}) =>
|
||||||
console.log(`[chat ${sessionId.slice(0, 8)}] ${stage}`, { dt: Date.now() - t0, ...extra });
|
baseLog.info({ stage, dt_ms: Date.now() - t0, ...extra }, stage);
|
||||||
log("POST received");
|
log("POST received");
|
||||||
|
|
||||||
if (!isSupabaseConfigured()) {
|
if (!isSupabaseConfigured()) {
|
||||||
|
|
|
||||||
|
|
@ -13,6 +13,7 @@ import { getLocale } from "@/components/locale-toggle";
|
||||||
import { AuthBar } from "@/components/auth-bar";
|
import { AuthBar } from "@/components/auth-bar";
|
||||||
import { ChatBubble } from "@/components/chat-bubble";
|
import { ChatBubble } from "@/components/chat-bubble";
|
||||||
import { DocReadingView } from "@/components/doc-reading-view";
|
import { DocReadingView } from "@/components/doc-reading-view";
|
||||||
|
import { AnomalyHighlights, type AnomalyFlag } from "@/components/anomaly-highlights";
|
||||||
import { MarkdownBody } from "@/components/markdown-body";
|
import { MarkdownBody } from "@/components/markdown-body";
|
||||||
|
|
||||||
export const dynamic = "force-dynamic";
|
export const dynamic = "force-dynamic";
|
||||||
|
|
@ -70,17 +71,31 @@ export default async function DocPage({
|
||||||
.sort((a, b) => b[1] - a[1])
|
.sort((a, b) => b[1] - a[1])
|
||||||
.slice(0, 6);
|
.slice(0, 6);
|
||||||
|
|
||||||
// Count UFO/cryptid anomalies across chunks
|
// Count UFO/cryptid anomalies across chunks + collect flags for the highlight panel
|
||||||
let ufoCount = 0;
|
|
||||||
let cryptidCount = 0;
|
|
||||||
let imageCount = 0;
|
let imageCount = 0;
|
||||||
for (const [, chunks] of byPage) {
|
const ufoFlags: AnomalyFlag[] = [];
|
||||||
|
const cryptidFlags: AnomalyFlag[] = [];
|
||||||
|
for (const [page, chunks] of byPage) {
|
||||||
for (const c of chunks) {
|
for (const c of chunks) {
|
||||||
if (c.fm.ufo_anomaly_detected) ufoCount++;
|
if (c.fm.ufo_anomaly_detected)
|
||||||
if (c.fm.cryptid_anomaly_detected) cryptidCount++;
|
ufoFlags.push({
|
||||||
|
chunk_id: c.fm.chunk_id,
|
||||||
|
page,
|
||||||
|
type: c.fm.ufo_anomaly_type ?? null,
|
||||||
|
rationale: c.fm.ufo_anomaly_rationale ?? null,
|
||||||
|
});
|
||||||
|
if (c.fm.cryptid_anomaly_detected)
|
||||||
|
cryptidFlags.push({
|
||||||
|
chunk_id: c.fm.chunk_id,
|
||||||
|
page,
|
||||||
|
type: c.fm.cryptid_anomaly_type ?? null,
|
||||||
|
rationale: c.fm.cryptid_anomaly_rationale ?? null,
|
||||||
|
});
|
||||||
if (c.fm.type === "image") imageCount++;
|
if (c.fm.type === "image") imageCount++;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
const ufoCount = ufoFlags.length;
|
||||||
|
const cryptidCount = cryptidFlags.length;
|
||||||
|
|
||||||
const classification = (doc?.fm.highest_classification as string) ?? "—";
|
const classification = (doc?.fm.highest_classification as string) ?? "—";
|
||||||
const collection = (doc?.fm.collection as string) ?? "—";
|
const collection = (doc?.fm.collection as string) ?? "—";
|
||||||
|
|
@ -136,6 +151,8 @@ export default async function DocPage({
|
||||||
)}
|
)}
|
||||||
</header>
|
</header>
|
||||||
|
|
||||||
|
<AnomalyHighlights docId={docId} ufo={ufoFlags} cryptid={cryptidFlags} />
|
||||||
|
|
||||||
<DocReadingView docId={docId} reading={reading} chunksByPage={ordered} />
|
<DocReadingView docId={docId} reading={reading} chunksByPage={ordered} />
|
||||||
|
|
||||||
<ChatBubble context={{ doc_id: docId }} />
|
<ChatBubble context={{ doc_id: docId }} />
|
||||||
|
|
|
||||||
|
|
@ -11,6 +11,7 @@ import { ChatBubble } from "@/components/chat-bubble";
|
||||||
import { AuthBar } from "@/components/auth-bar";
|
import { AuthBar } from "@/components/auth-bar";
|
||||||
import { EntityGraphMini } from "@/components/entity-graph-mini";
|
import { EntityGraphMini } from "@/components/entity-graph-mini";
|
||||||
import { EntityRelations } from "@/components/entity-relations";
|
import { EntityRelations } from "@/components/entity-relations";
|
||||||
|
import { EntityAttributes } from "@/components/entity-attributes";
|
||||||
import {
|
import {
|
||||||
getEntityCore,
|
getEntityCore,
|
||||||
getEntityMentionsByDoc,
|
getEntityMentionsByDoc,
|
||||||
|
|
@ -111,6 +112,21 @@ export default async function EntityPage({
|
||||||
const classColor = CLASS_COLOR[folder as EntityClass];
|
const classColor = CLASS_COLOR[folder as EntityClass];
|
||||||
const classBg = CLASS_BG[folder as EntityClass];
|
const classBg = CLASS_BG[folder as EntityClass];
|
||||||
|
|
||||||
|
// The generated entity bodies hold only "# Title" + empty "## Description"
|
||||||
|
// headings — strip headings and see if any real prose remains.
|
||||||
|
const bodyProse = (wiki?.body ?? "").replace(/^#.*$/gm, "").trim();
|
||||||
|
const hasNarrativeProse = bodyProse.length > 20;
|
||||||
|
// Does the frontmatter carry any displayable description/attribute?
|
||||||
|
const fm = (wiki?.fm ?? {}) as Record<string, unknown>;
|
||||||
|
const arr = (v: unknown) => Array.isArray(v) && v.length > 0;
|
||||||
|
const fmHasContent = Boolean(
|
||||||
|
fm.narrative_summary_pt_br || fm.narrative_summary_en || fm.maneuver_notes ||
|
||||||
|
fm.shape || fm.color || fm.medium || fm.event_class || fm.person_class ||
|
||||||
|
fm.org_class || fm.geo_class || fm.date_start ||
|
||||||
|
arr(fm.countries) || arr(fm.roles) || arr(fm.affiliations) ||
|
||||||
|
arr(fm.primary_location_names) || arr(fm.regions_or_states),
|
||||||
|
);
|
||||||
|
|
||||||
return (
|
return (
|
||||||
<main className="min-h-screen p-6 md:p-10 max-w-6xl mx-auto">
|
<main className="min-h-screen p-6 md:p-10 max-w-6xl mx-auto">
|
||||||
<div className="flex items-start justify-between gap-4 mb-6">
|
<div className="flex items-start justify-between gap-4 mb-6">
|
||||||
|
|
@ -230,6 +246,9 @@ export default async function EntityPage({
|
||||||
<div className="grid grid-cols-1 lg:grid-cols-[1fr_320px] gap-8">
|
<div className="grid grid-cols-1 lg:grid-cols-[1fr_320px] gap-8">
|
||||||
{/* MAIN — narrative + chunks live */}
|
{/* MAIN — narrative + chunks live */}
|
||||||
<article>
|
<article>
|
||||||
|
{/* Structured description + attributes from frontmatter */}
|
||||||
|
{wiki?.fm && <EntityAttributes fm={wiki.fm as Record<string, unknown>} />}
|
||||||
|
|
||||||
{/* Live chunk previews — most impactful section */}
|
{/* Live chunk previews — most impactful section */}
|
||||||
{sampleChunks.length > 0 && (
|
{sampleChunks.length > 0 && (
|
||||||
<section className="mb-10">
|
<section className="mb-10">
|
||||||
|
|
@ -283,17 +302,18 @@ export default async function EntityPage({
|
||||||
</section>
|
</section>
|
||||||
)}
|
)}
|
||||||
|
|
||||||
{/* Narrative body (Haiku stub OK quando rico) */}
|
{/* Narrative body — only when it carries real prose, not just the
|
||||||
{wiki?.body && wiki.body.trim().length > 30 && (
|
empty "## Description" headings the generator leaves behind. */}
|
||||||
|
{hasNarrativeProse && (
|
||||||
<section className="pt-6 border-t border-[rgba(0,255,156,0.12)]">
|
<section className="pt-6 border-t border-[rgba(0,255,156,0.12)]">
|
||||||
<h2 className="font-mono text-sm text-[#7fdbff] uppercase tracking-widest mb-3 border-l-2 border-[#7fdbff] pl-3">
|
<h2 className="font-mono text-sm text-[#7fdbff] uppercase tracking-widest mb-3 border-l-2 border-[#7fdbff] pl-3">
|
||||||
Narrativa
|
Narrativa
|
||||||
</h2>
|
</h2>
|
||||||
<MarkdownBody>{wiki.body}</MarkdownBody>
|
<MarkdownBody>{wiki!.body}</MarkdownBody>
|
||||||
</section>
|
</section>
|
||||||
)}
|
)}
|
||||||
|
|
||||||
{sampleChunks.length === 0 && (!wiki?.body || wiki.body.trim().length === 0) && (
|
{sampleChunks.length === 0 && !hasNarrativeProse && !fmHasContent && (
|
||||||
<div className="text-[#5a6678] italic text-sm p-6 border border-[rgba(255,165,0,0.30)] bg-[rgba(255,165,0,0.05)] rounded">
|
<div className="text-[#5a6678] italic text-sm p-6 border border-[rgba(255,165,0,0.30)] bg-[rgba(255,165,0,0.05)] rounded">
|
||||||
Entidade ainda sem chunks indexados na DB. Aguarde o indexer terminar.
|
Entidade ainda sem chunks indexados na DB. Aguarde o indexer terminar.
|
||||||
</div>
|
</div>
|
||||||
|
|
|
||||||
135
web/components/anomaly-highlights.tsx
Normal file
135
web/components/anomaly-highlights.tsx
Normal file
|
|
@ -0,0 +1,135 @@
|
||||||
|
/**
|
||||||
|
* AnomalyHighlights — prominent UAP / cryptid anomaly panel for the document
|
||||||
|
* page. The clean reading version is the default body, but the investigative
|
||||||
|
* "destaque" of every flagged passage must stay visible regardless of which
|
||||||
|
* view (reading or scan) is active. Identical type+rationale flags are grouped
|
||||||
|
* and each group links to the per-page scan where the anomaly was detected.
|
||||||
|
*/
|
||||||
|
import Link from "next/link";
|
||||||
|
|
||||||
|
export interface AnomalyFlag {
|
||||||
|
chunk_id: string;
|
||||||
|
page: number;
|
||||||
|
type: string | null;
|
||||||
|
rationale: string | null;
|
||||||
|
}
|
||||||
|
|
||||||
|
function clean(v: string | null): string | null {
|
||||||
|
const s = typeof v === "string" ? v.trim() : "";
|
||||||
|
return s && s.toLowerCase() !== "null" ? s : null;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface Group {
|
||||||
|
type: string | null;
|
||||||
|
rationale: string | null; // shown only when the group has a single flag
|
||||||
|
count: number;
|
||||||
|
pages: number[];
|
||||||
|
}
|
||||||
|
|
||||||
|
// Group by anomaly type so the panel stays a scannable "destaque" overview.
|
||||||
|
// Per-passage rationale is kept only when a type has exactly one flag; the full
|
||||||
|
// per-chunk rationale remains available in the "trechos · scan original" view.
|
||||||
|
function groupFlags(flags: AnomalyFlag[]): Group[] {
|
||||||
|
const m = new Map<string, Group>();
|
||||||
|
for (const f of flags) {
|
||||||
|
const type = clean(f.type);
|
||||||
|
const rationale = clean(f.rationale);
|
||||||
|
const key = type ?? "anomalia";
|
||||||
|
const g = m.get(key) ?? { type, rationale, count: 0, pages: [] };
|
||||||
|
g.count += 1;
|
||||||
|
g.rationale = g.count === 1 ? rationale : null;
|
||||||
|
if (!g.pages.includes(f.page)) g.pages.push(f.page);
|
||||||
|
m.set(key, g);
|
||||||
|
}
|
||||||
|
return Array.from(m.values())
|
||||||
|
.map((g) => ({ ...g, pages: g.pages.sort((a, b) => a - b) }))
|
||||||
|
.sort((a, b) => b.count - a.count || a.pages[0] - b.pages[0]);
|
||||||
|
}
|
||||||
|
|
||||||
|
function pad(p: number): string {
|
||||||
|
return String(p).padStart(3, "0");
|
||||||
|
}
|
||||||
|
|
||||||
|
function PageChips({ docId, pages }: { docId: string; pages: number[] }) {
|
||||||
|
const shown = pages.slice(0, 14);
|
||||||
|
const extra = pages.length - shown.length;
|
||||||
|
return (
|
||||||
|
<span className="inline-flex flex-wrap gap-1 align-middle">
|
||||||
|
{shown.map((p) => (
|
||||||
|
<Link
|
||||||
|
key={p}
|
||||||
|
href={`/d/${docId}/p${pad(p)}`}
|
||||||
|
className="font-mono text-[10px] px-1.5 py-0.5 border border-[rgba(127,219,255,0.30)] text-[#7fdbff] rounded hover:border-[#00ff9c] hover:text-[#00ff9c]"
|
||||||
|
>
|
||||||
|
p{p}
|
||||||
|
</Link>
|
||||||
|
))}
|
||||||
|
{extra > 0 && <span className="font-mono text-[10px] text-[#5a6678]">+{extra}</span>}
|
||||||
|
</span>
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
export function AnomalyHighlights({
|
||||||
|
docId,
|
||||||
|
ufo,
|
||||||
|
cryptid,
|
||||||
|
}: {
|
||||||
|
docId: string;
|
||||||
|
ufo: AnomalyFlag[];
|
||||||
|
cryptid: AnomalyFlag[];
|
||||||
|
}) {
|
||||||
|
if (ufo.length === 0 && cryptid.length === 0) return null;
|
||||||
|
const ufoGroups = groupFlags(ufo);
|
||||||
|
const cryptidGroups = groupFlags(cryptid);
|
||||||
|
|
||||||
|
return (
|
||||||
|
<section className="mb-6 border border-[rgba(0,255,156,0.40)] bg-[rgba(0,255,156,0.05)] rounded p-4">
|
||||||
|
{ufo.length > 0 && (
|
||||||
|
<>
|
||||||
|
<h2 className="font-mono text-sm text-[#00ff9c] mb-3 flex items-center gap-2">
|
||||||
|
🛸 Anomalias UAP destacadas
|
||||||
|
<span className="text-[#5a6678]">
|
||||||
|
({ufo.length} {ufo.length === 1 ? "trecho" : "trechos"} · {ufoGroups.length}{" "}
|
||||||
|
{ufoGroups.length === 1 ? "tipo" : "tipos"})
|
||||||
|
</span>
|
||||||
|
</h2>
|
||||||
|
<ul className="space-y-2.5">
|
||||||
|
{ufoGroups.map((g, i) => (
|
||||||
|
<li key={i} className="text-sm text-[#c8d4e6] leading-relaxed">
|
||||||
|
<span className="font-mono text-[#00ff9c]">🛸 {g.type ?? "anomalia"}</span>
|
||||||
|
{g.count > 1 && (
|
||||||
|
<span className="font-mono text-[10px] text-[#5a6678]"> ×{g.count}</span>
|
||||||
|
)}
|
||||||
|
{g.rationale && <span className="text-[#c8d4e6]"> — {g.rationale}</span>}{" "}
|
||||||
|
<PageChips docId={docId} pages={g.pages} />
|
||||||
|
</li>
|
||||||
|
))}
|
||||||
|
</ul>
|
||||||
|
</>
|
||||||
|
)}
|
||||||
|
|
||||||
|
{cryptid.length > 0 && (
|
||||||
|
<div className={ufo.length > 0 ? "mt-4 pt-4 border-t border-[rgba(155,93,229,0.25)]" : ""}>
|
||||||
|
<h2 className="font-mono text-sm text-[#9b5de5] mb-3 flex items-center gap-2">
|
||||||
|
👁 Anomalias cryptid destacadas
|
||||||
|
<span className="text-[#5a6678]">
|
||||||
|
({cryptid.length} {cryptid.length === 1 ? "trecho" : "trechos"})
|
||||||
|
</span>
|
||||||
|
</h2>
|
||||||
|
<ul className="space-y-2.5">
|
||||||
|
{cryptidGroups.map((g, i) => (
|
||||||
|
<li key={i} className="text-sm text-[#c8d4e6] leading-relaxed">
|
||||||
|
<span className="font-mono text-[#9b5de5]">👁 {g.type ?? "anomalia"}</span>
|
||||||
|
{g.count > 1 && (
|
||||||
|
<span className="font-mono text-[10px] text-[#5a6678]"> ×{g.count}</span>
|
||||||
|
)}
|
||||||
|
{g.rationale && <span className="text-[#c8d4e6]"> — {g.rationale}</span>}{" "}
|
||||||
|
<PageChips docId={docId} pages={g.pages} />
|
||||||
|
</li>
|
||||||
|
))}
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
</section>
|
||||||
|
);
|
||||||
|
}
|
||||||
164
web/components/entity-attributes.tsx
Normal file
164
web/components/entity-attributes.tsx
Normal file
|
|
@ -0,0 +1,164 @@
|
||||||
|
/**
|
||||||
|
* EntityAttributes — renders an entity's descriptive content and structured
|
||||||
|
* attributes straight from its wiki frontmatter. The generated entity files
|
||||||
|
* carry their real content in YAML fields (narrative_summary_*, maneuver_notes,
|
||||||
|
* shape, color, roles, countries, …) while the markdown body holds only empty
|
||||||
|
* "## Description" headings — so the page must surface the frontmatter.
|
||||||
|
*/
|
||||||
|
|
||||||
|
type FM = Record<string, unknown>;
|
||||||
|
|
||||||
|
const ATTR_LABELS: Record<string, string> = {
|
||||||
|
event_class: "Tipo de evento",
|
||||||
|
date_start: "Início",
|
||||||
|
date_end: "Fim",
|
||||||
|
date_confidence: "Confiança da data",
|
||||||
|
primary_location_names: "Locais",
|
||||||
|
primary_location_geo_classes: "Classe do local",
|
||||||
|
geo_class: "Classe geográfica",
|
||||||
|
countries: "Países",
|
||||||
|
regions_or_states: "Regiões / estados",
|
||||||
|
org_class: "Tipo de organização",
|
||||||
|
person_class: "Tipo de pessoa",
|
||||||
|
affiliations: "Afiliações",
|
||||||
|
roles: "Funções / papéis",
|
||||||
|
shape: "Forma",
|
||||||
|
color: "Cor",
|
||||||
|
medium: "Meio",
|
||||||
|
size_estimate_m: "Tamanho estimado (m)",
|
||||||
|
altitude_ft: "Altitude (ft)",
|
||||||
|
speed_kts: "Velocidade (kt)",
|
||||||
|
};
|
||||||
|
|
||||||
|
// Order in which attributes are shown (only those present render).
|
||||||
|
const ATTR_ORDER = [
|
||||||
|
"event_class",
|
||||||
|
"person_class",
|
||||||
|
"org_class",
|
||||||
|
"shape",
|
||||||
|
"color",
|
||||||
|
"medium",
|
||||||
|
"size_estimate_m",
|
||||||
|
"altitude_ft",
|
||||||
|
"speed_kts",
|
||||||
|
"date_start",
|
||||||
|
"date_end",
|
||||||
|
"date_confidence",
|
||||||
|
"geo_class",
|
||||||
|
"countries",
|
||||||
|
"regions_or_states",
|
||||||
|
"primary_location_names",
|
||||||
|
"primary_location_geo_classes",
|
||||||
|
"affiliations",
|
||||||
|
"roles",
|
||||||
|
];
|
||||||
|
|
||||||
|
function clean(v: unknown): string | null {
|
||||||
|
const s = typeof v === "string" ? v.trim() : "";
|
||||||
|
return s && s.toLowerCase() !== "null" ? s : null;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Placeholder values that carry no real attribute information — hidden from the
|
||||||
|
// ATRIBUTOS grid (but never from the free-text description).
|
||||||
|
const EMPTY_TOKENS = new Set([
|
||||||
|
"null",
|
||||||
|
"none",
|
||||||
|
"n/a",
|
||||||
|
"na",
|
||||||
|
"unknown",
|
||||||
|
"unidentified",
|
||||||
|
"undetermined",
|
||||||
|
"unspecified",
|
||||||
|
"not specified",
|
||||||
|
"not stated",
|
||||||
|
"not applicable",
|
||||||
|
]);
|
||||||
|
|
||||||
|
function isEmptyToken(s: string): boolean {
|
||||||
|
return EMPTY_TOKENS.has(s.trim().toLowerCase());
|
||||||
|
}
|
||||||
|
|
||||||
|
function fmtValue(v: unknown): string | null {
|
||||||
|
if (v == null) return null;
|
||||||
|
if (Array.isArray(v)) {
|
||||||
|
const items = v
|
||||||
|
.map((x) => (typeof x === "string" ? x.trim() : String(x)))
|
||||||
|
.filter((x) => x && !x.startsWith("[[") && !isEmptyToken(x));
|
||||||
|
return items.length ? items.join(", ") : null;
|
||||||
|
}
|
||||||
|
if (typeof v === "number") return String(v);
|
||||||
|
const s = clean(v);
|
||||||
|
return s && !isEmptyToken(s) ? s : null;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function EntityAttributes({ fm }: { fm: FM }) {
|
||||||
|
const ptText = clean(fm.narrative_summary_pt_br) ?? clean(fm.description_pt_br);
|
||||||
|
const enText = clean(fm.narrative_summary_en) ?? clean(fm.description_en);
|
||||||
|
const notes = clean(fm.maneuver_notes); // source-language only (uap_object)
|
||||||
|
|
||||||
|
const attrs = ATTR_ORDER.map((k) => [k, fmtValue(fm[k])] as const).filter(
|
||||||
|
([, v]) => v !== null,
|
||||||
|
);
|
||||||
|
|
||||||
|
const hasDescription = Boolean(ptText || enText || notes);
|
||||||
|
if (!hasDescription && attrs.length === 0) return null;
|
||||||
|
|
||||||
|
return (
|
||||||
|
<section className="mb-10">
|
||||||
|
{hasDescription && (
|
||||||
|
<>
|
||||||
|
{ptText && (
|
||||||
|
<div className="mb-4">
|
||||||
|
<h2 className="font-mono text-sm text-[#7fdbff] uppercase tracking-widest mb-2 border-l-2 border-[#7fdbff] pl-3">
|
||||||
|
Descrição (PT-BR)
|
||||||
|
</h2>
|
||||||
|
<p className="text-[15px] leading-relaxed text-[#c8d4e6]">{ptText}</p>
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
{enText && (
|
||||||
|
<div className="mb-4">
|
||||||
|
<h2 className="font-mono text-sm text-[#7fdbff] uppercase tracking-widest mb-2 border-l-2 border-[#7fdbff] pl-3">
|
||||||
|
Description (EN)
|
||||||
|
</h2>
|
||||||
|
<p className="text-[15px] leading-relaxed text-[#c8d4e6]">{enText}</p>
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
{notes && !ptText && !enText && (
|
||||||
|
<div className="mb-4">
|
||||||
|
<h2 className="font-mono text-sm text-[#7fdbff] uppercase tracking-widest mb-2 border-l-2 border-[#7fdbff] pl-3">
|
||||||
|
Descrição · Description
|
||||||
|
</h2>
|
||||||
|
<p className="text-[15px] leading-relaxed text-[#c8d4e6]">{notes}</p>
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
{notes && (ptText || enText) && (
|
||||||
|
<div className="mb-4">
|
||||||
|
<h3 className="font-mono text-[11px] text-[#8896aa] uppercase tracking-widest mb-1">
|
||||||
|
Notas de manobra / aparência
|
||||||
|
</h3>
|
||||||
|
<p className="text-sm leading-relaxed text-[#8896aa]">{notes}</p>
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
</>
|
||||||
|
)}
|
||||||
|
|
||||||
|
{attrs.length > 0 && (
|
||||||
|
<div className="mt-2">
|
||||||
|
<h3 className="font-mono text-[11px] text-[#8896aa] uppercase tracking-widest mb-2">
|
||||||
|
Atributos
|
||||||
|
</h3>
|
||||||
|
<dl className="grid grid-cols-1 sm:grid-cols-2 gap-x-6 gap-y-2">
|
||||||
|
{attrs.map(([k, v]) => (
|
||||||
|
<div key={k} className="flex items-baseline gap-2 border-b border-[rgba(127,219,255,0.10)] pb-1.5">
|
||||||
|
<dt className="font-mono text-[11px] text-[#5a6678] uppercase tracking-wide shrink-0 min-w-[42%]">
|
||||||
|
{ATTR_LABELS[k] ?? k}
|
||||||
|
</dt>
|
||||||
|
<dd className="text-sm text-[#c8d4e6]">{v}</dd>
|
||||||
|
</div>
|
||||||
|
))}
|
||||||
|
</dl>
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
</section>
|
||||||
|
);
|
||||||
|
}
|
||||||
137
web/components/search-autocomplete.tsx
Normal file
137
web/components/search-autocomplete.tsx
Normal file
|
|
@ -0,0 +1,137 @@
|
||||||
|
"use client";
|
||||||
|
|
||||||
|
/**
|
||||||
|
* SearchAutocomplete — type-as-you-go dropdown on the /search input.
|
||||||
|
*
|
||||||
|
* Hits /api/search/autocomplete (Meilisearch) with debounced fetch and renders
|
||||||
|
* a two-section dropdown: matching documents (jump targets) and matching
|
||||||
|
* chunks (in-doc passages with excerpt). Sub-30ms target. Keyboard navigation
|
||||||
|
* via Up/Down + Enter. Esc closes.
|
||||||
|
*/
|
||||||
|
import { useEffect, useRef, useState } from "react";
|
||||||
|
import Link from "next/link";
|
||||||
|
|
||||||
|
interface DocSuggestion {
|
||||||
|
doc_id: string;
|
||||||
|
title: string;
|
||||||
|
collection?: string;
|
||||||
|
href: string;
|
||||||
|
}
|
||||||
|
interface ChunkSuggestion {
|
||||||
|
chunk_id: string;
|
||||||
|
doc_id: string;
|
||||||
|
page: number;
|
||||||
|
type: string;
|
||||||
|
excerpt: string;
|
||||||
|
ufo_anomaly: boolean;
|
||||||
|
href: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface ApiResponse {
|
||||||
|
q: string;
|
||||||
|
duration_ms?: number;
|
||||||
|
documents: DocSuggestion[];
|
||||||
|
chunks: ChunkSuggestion[];
|
||||||
|
}
|
||||||
|
|
||||||
|
export function SearchAutocomplete({ query, onPick }: { query: string; onPick?: () => void }) {
|
||||||
|
const [data, setData] = useState<ApiResponse | null>(null);
|
||||||
|
const [loading, setLoading] = useState(false);
|
||||||
|
const [open, setOpen] = useState(false);
|
||||||
|
const timer = useRef<ReturnType<typeof setTimeout> | null>(null);
|
||||||
|
const abort = useRef<AbortController | null>(null);
|
||||||
|
|
||||||
|
useEffect(() => {
|
||||||
|
const q = query.trim();
|
||||||
|
if (q.length < 2) {
|
||||||
|
setData(null); setOpen(false); return;
|
||||||
|
}
|
||||||
|
if (timer.current) clearTimeout(timer.current);
|
||||||
|
timer.current = setTimeout(async () => {
|
||||||
|
abort.current?.abort();
|
||||||
|
abort.current = new AbortController();
|
||||||
|
setLoading(true);
|
||||||
|
try {
|
||||||
|
const r = await fetch(`/api/search/autocomplete?q=${encodeURIComponent(q)}`, {
|
||||||
|
signal: abort.current.signal,
|
||||||
|
});
|
||||||
|
if (!r.ok) throw new Error(`HTTP ${r.status}`);
|
||||||
|
const j = (await r.json()) as ApiResponse;
|
||||||
|
setData(j);
|
||||||
|
setOpen(j.documents.length + j.chunks.length > 0);
|
||||||
|
} catch (e) {
|
||||||
|
if ((e as Error).name === "AbortError") return;
|
||||||
|
setData(null); setOpen(false);
|
||||||
|
} finally {
|
||||||
|
setLoading(false);
|
||||||
|
}
|
||||||
|
}, 150);
|
||||||
|
return () => { if (timer.current) clearTimeout(timer.current); };
|
||||||
|
}, [query]);
|
||||||
|
|
||||||
|
if (!open || !data) return null;
|
||||||
|
|
||||||
|
return (
|
||||||
|
<div className="absolute z-30 left-0 right-0 mt-1 max-h-[60vh] overflow-y-auto bg-[#0a121e] border border-[#00ff9c] rounded shadow-lg">
|
||||||
|
<div className="flex items-center justify-between px-3 py-1.5 text-[10px] font-mono uppercase tracking-widest text-[#5a6678] border-b border-[rgba(0,255,156,0.20)]">
|
||||||
|
<span>
|
||||||
|
⚡ autocomplete · {data.documents.length} docs · {data.chunks.length} trechos
|
||||||
|
</span>
|
||||||
|
<span>{loading ? "…" : `${data.duration_ms ?? "?"}ms`}</span>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{data.documents.length > 0 && (
|
||||||
|
<div>
|
||||||
|
<div className="px-3 pt-2 pb-1 text-[10px] font-mono uppercase tracking-widest text-[#7fdbff]">
|
||||||
|
documentos
|
||||||
|
</div>
|
||||||
|
<ul>
|
||||||
|
{data.documents.map((d) => (
|
||||||
|
<li key={d.doc_id}>
|
||||||
|
<Link
|
||||||
|
href={d.href}
|
||||||
|
onClick={onPick}
|
||||||
|
className="block px-3 py-2 hover:bg-[rgba(0,255,156,0.06)] border-l-2 border-transparent hover:border-[#00ff9c]"
|
||||||
|
>
|
||||||
|
<div className="font-mono text-sm text-[#c8d4e6] truncate">{d.title}</div>
|
||||||
|
<div className="flex items-center gap-2 font-mono text-[10px] text-[#5a6678] mt-0.5">
|
||||||
|
<span>{d.doc_id}</span>
|
||||||
|
{d.collection && <span>· {d.collection}</span>}
|
||||||
|
</div>
|
||||||
|
</Link>
|
||||||
|
</li>
|
||||||
|
))}
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
|
||||||
|
{data.chunks.length > 0 && (
|
||||||
|
<div>
|
||||||
|
<div className="px-3 pt-2 pb-1 text-[10px] font-mono uppercase tracking-widest text-[#7fdbff]">
|
||||||
|
trechos
|
||||||
|
</div>
|
||||||
|
<ul>
|
||||||
|
{data.chunks.map((c) => (
|
||||||
|
<li key={`${c.doc_id}-${c.chunk_id}`}>
|
||||||
|
<Link
|
||||||
|
href={c.href}
|
||||||
|
onClick={onPick}
|
||||||
|
className="block px-3 py-2 hover:bg-[rgba(0,255,156,0.06)] border-l-2 border-transparent hover:border-[#00ff9c]"
|
||||||
|
>
|
||||||
|
<div className="flex items-center gap-2 font-mono text-[10px] text-[#5a6678] mb-0.5">
|
||||||
|
<span className="text-[#00ff9c]">{c.chunk_id}</span>
|
||||||
|
<span>p{c.page}</span>
|
||||||
|
<span>{c.type}</span>
|
||||||
|
{c.ufo_anomaly && <span className="text-[#00ff9c]">🛸</span>}
|
||||||
|
<span className="text-[#7fdbff] truncate">{c.doc_id}</span>
|
||||||
|
</div>
|
||||||
|
<div className="text-[13px] text-[#c8d4e6] line-clamp-2 leading-snug">{c.excerpt}</div>
|
||||||
|
</Link>
|
||||||
|
</li>
|
||||||
|
))}
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
@ -9,6 +9,7 @@ import Image from "next/image";
|
||||||
import Link from "next/link";
|
import Link from "next/link";
|
||||||
import { useEffect, useState } from "react";
|
import { useEffect, useState } from "react";
|
||||||
import { useRouter, useSearchParams } from "next/navigation";
|
import { useRouter, useSearchParams } from "next/navigation";
|
||||||
|
import { SearchAutocomplete } from "./search-autocomplete";
|
||||||
|
|
||||||
interface Hit {
|
interface Hit {
|
||||||
chunk_id: string;
|
chunk_id: string;
|
||||||
|
|
@ -94,7 +95,7 @@ export function SearchPanel({
|
||||||
onSubmit={submit}
|
onSubmit={submit}
|
||||||
className="space-y-3 mb-8 p-4 border border-[rgba(0,255,156,0.15)] bg-[#0a121e] rounded"
|
className="space-y-3 mb-8 p-4 border border-[rgba(0,255,156,0.15)] bg-[#0a121e] rounded"
|
||||||
>
|
>
|
||||||
<div>
|
<div className="relative">
|
||||||
<label className="font-mono text-[10px] uppercase tracking-widest text-[#5a6678] block mb-1">
|
<label className="font-mono text-[10px] uppercase tracking-widest text-[#5a6678] block mb-1">
|
||||||
query
|
query
|
||||||
</label>
|
</label>
|
||||||
|
|
@ -105,6 +106,7 @@ export function SearchPanel({
|
||||||
className="w-full bg-transparent border border-[rgba(0,255,156,0.20)] focus:border-[#00ff9c] rounded px-3 py-2 font-mono text-sm text-[#c8d4e6] outline-none"
|
className="w-full bg-transparent border border-[rgba(0,255,156,0.20)] focus:border-[#00ff9c] rounded px-3 py-2 font-mono text-sm text-[#c8d4e6] outline-none"
|
||||||
autoFocus
|
autoFocus
|
||||||
/>
|
/>
|
||||||
|
<SearchAutocomplete query={q} onPick={() => setQ("")} />
|
||||||
</div>
|
</div>
|
||||||
<div className="flex flex-wrap items-end gap-3">
|
<div className="flex flex-wrap items-end gap-3">
|
||||||
<div>
|
<div>
|
||||||
|
|
|
||||||
33
web/instrumentation.ts
Normal file
33
web/instrumentation.ts
Normal file
|
|
@ -0,0 +1,33 @@
|
||||||
|
/**
|
||||||
|
* Next.js instrumentation hook — loads Sentry (Glitchtip) init on server/edge.
|
||||||
|
*
|
||||||
|
* https://nextjs.org/docs/app/building-your-application/optimizing/instrumentation
|
||||||
|
*/
|
||||||
|
export async function register() {
|
||||||
|
if (process.env.NEXT_RUNTIME === "nodejs") {
|
||||||
|
await import("./sentry.server.config");
|
||||||
|
}
|
||||||
|
if (process.env.NEXT_RUNTIME === "edge") {
|
||||||
|
// Edge runtime gets a slimmer init via the same DSN; the SDK auto-detects.
|
||||||
|
await import("./sentry.server.config");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Capture unhandled promise rejections in Server Components / API routes and
|
||||||
|
// forward them through Sentry's hook. Loaded only on the server.
|
||||||
|
// Forward unhandled errors from Server Components / Route Handlers to Sentry.
|
||||||
|
// Loose typing so it tracks any captureRequestError signature change in
|
||||||
|
// @sentry/nextjs — observability code must not block real errors.
|
||||||
|
export const onRequestError = async (
|
||||||
|
err: unknown,
|
||||||
|
request: Parameters<typeof import("@sentry/nextjs").captureRequestError>[1],
|
||||||
|
context: Parameters<typeof import("@sentry/nextjs").captureRequestError>[2],
|
||||||
|
) => {
|
||||||
|
if (process.env.NEXT_RUNTIME !== "nodejs") return;
|
||||||
|
try {
|
||||||
|
const { captureRequestError } = await import("@sentry/nextjs");
|
||||||
|
await captureRequestError(err, request, context);
|
||||||
|
} catch {
|
||||||
|
/* never let observability swallow the original error */
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
@ -12,7 +12,11 @@ import { spawn } from "node:child_process";
|
||||||
import type { ChatProvider, ChatRequest, ChatResponse } from "./types";
|
import type { ChatProvider, ChatRequest, ChatResponse } from "./types";
|
||||||
|
|
||||||
const MODEL = process.env.CLAUDE_CODE_MODEL || "haiku";
|
const MODEL = process.env.CLAUDE_CODE_MODEL || "haiku";
|
||||||
const TIMEOUT_MS = 90_000;
|
// W1-TD#30: subprocess timeout is now configurable. Default 90s matches the
|
||||||
|
// previous hard-coded value. Lower it (e.g. 60s) when the provider should bail
|
||||||
|
// out of slow generations sooner; raise it (e.g. 180s) when running heavier
|
||||||
|
// models like opus on long contexts.
|
||||||
|
const TIMEOUT_MS = Number(process.env.CLAUDE_CODE_TIMEOUT_MS || 90_000);
|
||||||
|
|
||||||
function buildPrompt(req: ChatRequest): string {
|
function buildPrompt(req: ChatRequest): string {
|
||||||
// Single-shot prompt: collapse history into a structured transcript.
|
// Single-shot prompt: collapse history into a structured transcript.
|
||||||
|
|
|
||||||
|
|
@ -23,6 +23,105 @@ const PRIMARY = process.env.OPENROUTER_MODEL || "deepseek/deepseek-v4-flash:free
|
||||||
const FALLBACK = process.env.OPENROUTER_FALLBACK_MODEL || "nvidia/nemotron-3-super-120b-a12b:free";
|
const FALLBACK = process.env.OPENROUTER_FALLBACK_MODEL || "nvidia/nemotron-3-super-120b-a12b:free";
|
||||||
const ENDPOINT = "https://openrouter.ai/api/v1/chat/completions";
|
const ENDPOINT = "https://openrouter.ai/api/v1/chat/completions";
|
||||||
|
|
||||||
|
// W1-TD#23: retry + circuit breaker for OpenRouter free-tier flakiness.
|
||||||
|
// Transient errors (429/502/503/504/network) are retried up to RETRY_MAX times
|
||||||
|
// with exponential backoff. Repeated PRIMARY failures within CB_WINDOW_MS
|
||||||
|
// trip an in-memory circuit breaker that promotes FALLBACK as the active
|
||||||
|
// model for CB_COOLDOWN_MS — protecting the chat from a single bad model.
|
||||||
|
const RETRY_MAX = Number(process.env.OPENROUTER_RETRY_MAX || 2);
|
||||||
|
const RETRY_BASE_MS = Number(process.env.OPENROUTER_RETRY_BASE_MS || 400);
|
||||||
|
const CB_WINDOW_MS = Number(process.env.OPENROUTER_CB_WINDOW_MS || 60_000);
|
||||||
|
const CB_THRESHOLD = Number(process.env.OPENROUTER_CB_THRESHOLD || 3);
|
||||||
|
const CB_COOLDOWN_MS = Number(process.env.OPENROUTER_CB_COOLDOWN_MS || 120_000);
|
||||||
|
|
||||||
|
const RETRYABLE_STATUSES = new Set([408, 425, 429, 500, 502, 503, 504]);
|
||||||
|
|
||||||
|
interface ModelBreaker { failures: number[]; openedAt: number | null }
|
||||||
|
const breakers = new Map<string, ModelBreaker>();
|
||||||
|
|
||||||
|
function breakerFor(model: string): ModelBreaker {
|
||||||
|
let b = breakers.get(model);
|
||||||
|
if (!b) { b = { failures: [], openedAt: null }; breakers.set(model, b); }
|
||||||
|
return b;
|
||||||
|
}
|
||||||
|
|
||||||
|
function isCircuitOpen(model: string): boolean {
|
||||||
|
const b = breakerFor(model);
|
||||||
|
if (!b.openedAt) return false;
|
||||||
|
if (Date.now() - b.openedAt > CB_COOLDOWN_MS) {
|
||||||
|
// Half-open: clear and let the next call probe the upstream.
|
||||||
|
b.openedAt = null; b.failures = [];
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
function recordFailure(model: string): void {
|
||||||
|
const b = breakerFor(model);
|
||||||
|
const now = Date.now();
|
||||||
|
b.failures = b.failures.filter((t) => now - t < CB_WINDOW_MS);
|
||||||
|
b.failures.push(now);
|
||||||
|
if (b.failures.length >= CB_THRESHOLD) b.openedAt = now;
|
||||||
|
}
|
||||||
|
|
||||||
|
function recordSuccess(model: string): void {
|
||||||
|
const b = breakerFor(model);
|
||||||
|
b.failures = []; b.openedAt = null;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Pick the active model honoring an open circuit on PRIMARY. */
|
||||||
|
function pickModel(preferred: string): string {
|
||||||
|
if (preferred === PRIMARY && isCircuitOpen(PRIMARY)) return FALLBACK;
|
||||||
|
return preferred;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Fetch wrapper with retry + breaker accounting. */
|
||||||
|
async function fetchOpenRouter(
|
||||||
|
body: Record<string, unknown>,
|
||||||
|
preferredModel: string,
|
||||||
|
): Promise<{ res: Response; model: string }> {
|
||||||
|
const model = pickModel(preferredModel);
|
||||||
|
body.model = model;
|
||||||
|
|
||||||
|
let lastErr: unknown;
|
||||||
|
for (let attempt = 0; attempt <= RETRY_MAX; attempt++) {
|
||||||
|
try {
|
||||||
|
const res = await fetch(ENDPOINT, {
|
||||||
|
method: "POST",
|
||||||
|
headers: headers(),
|
||||||
|
body: JSON.stringify(body),
|
||||||
|
});
|
||||||
|
if (res.ok) {
|
||||||
|
recordSuccess(model);
|
||||||
|
return { res, model };
|
||||||
|
}
|
||||||
|
if (!RETRYABLE_STATUSES.has(res.status)) {
|
||||||
|
const txt = await res.text();
|
||||||
|
const err = new Error(`openrouter HTTP ${res.status}: ${txt.slice(0, 300)}`);
|
||||||
|
if (res.status === 429 || res.status === 402) {
|
||||||
|
(err as Error & { isRateLimit?: boolean }).isRateLimit = true;
|
||||||
|
}
|
||||||
|
recordFailure(model);
|
||||||
|
throw err;
|
||||||
|
}
|
||||||
|
// Retryable — wait with exponential backoff, honor Retry-After if present.
|
||||||
|
const ra = Number(res.headers.get("retry-after"));
|
||||||
|
const waitMs = Number.isFinite(ra) && ra > 0
|
||||||
|
? ra * 1000
|
||||||
|
: RETRY_BASE_MS * Math.pow(2, attempt);
|
||||||
|
await new Promise((r) => setTimeout(r, waitMs));
|
||||||
|
lastErr = new Error(`openrouter HTTP ${res.status} (attempt ${attempt + 1}/${RETRY_MAX + 1})`);
|
||||||
|
} catch (e) {
|
||||||
|
// Network/abort — also retry up to RETRY_MAX.
|
||||||
|
lastErr = e;
|
||||||
|
if (attempt >= RETRY_MAX) break;
|
||||||
|
await new Promise((r) => setTimeout(r, RETRY_BASE_MS * Math.pow(2, attempt)));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
recordFailure(model);
|
||||||
|
throw lastErr instanceof Error ? lastErr : new Error(String(lastErr));
|
||||||
|
}
|
||||||
|
|
||||||
type OAMsg =
|
type OAMsg =
|
||||||
| { role: "system" | "user"; content: string }
|
| { role: "system" | "user"; content: string }
|
||||||
| { role: "assistant"; content?: string | null; tool_calls?: OAToolCall[] }
|
| { role: "assistant"; content?: string | null; tool_calls?: OAToolCall[] }
|
||||||
|
|
@ -74,35 +173,26 @@ export interface SendOnceReq {
|
||||||
}
|
}
|
||||||
|
|
||||||
/** Non-streaming single shot — used by claude-code fallback path and tests. */
|
/** Non-streaming single shot — used by claude-code fallback path and tests. */
|
||||||
export async function sendOnce(req: SendOnceReq, model = PRIMARY): Promise<{
|
export async function sendOnce(req: SendOnceReq, preferredModel = PRIMARY): Promise<{
|
||||||
content: string;
|
content: string;
|
||||||
model: string;
|
model: string;
|
||||||
tokensIn?: number;
|
tokensIn?: number;
|
||||||
tokensOut?: number;
|
tokensOut?: number;
|
||||||
}> {
|
}> {
|
||||||
const body = {
|
const body: Record<string, unknown> = {
|
||||||
model,
|
|
||||||
messages: [
|
messages: [
|
||||||
{ role: "system", content: req.system },
|
{ role: "system", content: req.system },
|
||||||
...req.messages.slice(-20),
|
...req.messages.slice(-20),
|
||||||
],
|
],
|
||||||
max_tokens: req.maxTokens ?? 1024,
|
max_tokens: req.maxTokens ?? 1024,
|
||||||
};
|
};
|
||||||
const res = await fetch(ENDPOINT, {
|
const { res, model } = await fetchOpenRouter(body, preferredModel);
|
||||||
method: "POST",
|
|
||||||
headers: headers(),
|
|
||||||
body: JSON.stringify(body),
|
|
||||||
});
|
|
||||||
if (!res.ok) {
|
|
||||||
const txt = await res.text();
|
|
||||||
const err = new Error(`openrouter HTTP ${res.status}: ${txt.slice(0, 300)}`);
|
|
||||||
if (res.status === 429 || res.status === 402) {
|
|
||||||
(err as Error & { isRateLimit?: boolean }).isRateLimit = true;
|
|
||||||
}
|
|
||||||
throw err;
|
|
||||||
}
|
|
||||||
const data = await res.json();
|
const data = await res.json();
|
||||||
if (data.error) throw new Error(`openrouter error: ${data.error.message}`);
|
if (data.error) {
|
||||||
|
recordFailure(model);
|
||||||
|
throw new Error(`openrouter error: ${data.error.message}`);
|
||||||
|
}
|
||||||
|
recordSuccess(model);
|
||||||
return {
|
return {
|
||||||
content: data.choices?.[0]?.message?.content ?? "",
|
content: data.choices?.[0]?.message?.content ?? "",
|
||||||
model: data.model ?? model,
|
model: data.model ?? model,
|
||||||
|
|
@ -336,12 +426,11 @@ export async function streamWithTools(
|
||||||
|
|
||||||
async function openrouterStreamCall(
|
async function openrouterStreamCall(
|
||||||
messages: OAMsg[],
|
messages: OAMsg[],
|
||||||
model: string,
|
preferredModel: string,
|
||||||
opts: { withTools?: boolean } = {},
|
opts: { withTools?: boolean } = {},
|
||||||
): Promise<Response> {
|
): Promise<Response> {
|
||||||
const withTools = opts.withTools !== false;
|
const withTools = opts.withTools !== false;
|
||||||
const body: Record<string, unknown> = {
|
const body: Record<string, unknown> = {
|
||||||
model,
|
|
||||||
messages,
|
messages,
|
||||||
stream: true,
|
stream: true,
|
||||||
max_tokens: 1024,
|
max_tokens: 1024,
|
||||||
|
|
@ -350,19 +439,7 @@ async function openrouterStreamCall(
|
||||||
body.tools = TOOL_DEFINITIONS;
|
body.tools = TOOL_DEFINITIONS;
|
||||||
body.tool_choice = "auto";
|
body.tool_choice = "auto";
|
||||||
}
|
}
|
||||||
const res = await fetch(ENDPOINT, {
|
const { res } = await fetchOpenRouter(body, preferredModel);
|
||||||
method: "POST",
|
|
||||||
headers: headers(),
|
|
||||||
body: JSON.stringify(body),
|
|
||||||
});
|
|
||||||
if (!res.ok) {
|
|
||||||
const txt = await res.text();
|
|
||||||
const err = new Error(`openrouter HTTP ${res.status}: ${txt.slice(0, 300)}`);
|
|
||||||
if (res.status === 429 || res.status === 402) {
|
|
||||||
(err as Error & { isRateLimit?: boolean }).isRateLimit = true;
|
|
||||||
}
|
|
||||||
throw err;
|
|
||||||
}
|
|
||||||
return res;
|
return res;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
77
web/lib/logger.ts
Normal file
77
web/lib/logger.ts
Normal file
|
|
@ -0,0 +1,77 @@
|
||||||
|
/**
|
||||||
|
* Structured logger — pino with JSON output in production, pretty in dev.
|
||||||
|
*
|
||||||
|
* Use as:
|
||||||
|
* import { log, withRequest } from "@/lib/logger";
|
||||||
|
* log.info({ doc_id, page }, "rendering page");
|
||||||
|
* log.error({ err }, "embed-service down");
|
||||||
|
*
|
||||||
|
* For request-scoped logging:
|
||||||
|
* const reqLog = withRequest(request);
|
||||||
|
* reqLog.info({ duration_ms: dt }, "hybrid_search done");
|
||||||
|
*
|
||||||
|
* Edge runtime falls back to a console adapter (pino requires node).
|
||||||
|
*/
|
||||||
|
import pino from "pino";
|
||||||
|
|
||||||
|
// Edge runtime doesn't support pino's worker thread; detect and fall back.
|
||||||
|
const isEdge = typeof process === "undefined" || process.env.NEXT_RUNTIME === "edge";
|
||||||
|
|
||||||
|
function build(): pino.Logger {
|
||||||
|
if (isEdge) {
|
||||||
|
// Minimal adapter so middleware can call log.* without crashing.
|
||||||
|
const noop = () => undefined;
|
||||||
|
return {
|
||||||
|
info: (o: unknown, m?: string) => console.log(JSON.stringify({ level: "info", msg: m, ...(typeof o === "object" ? o : { v: o }) })),
|
||||||
|
warn: (o: unknown, m?: string) => console.warn(JSON.stringify({ level: "warn", msg: m, ...(typeof o === "object" ? o : { v: o }) })),
|
||||||
|
error: (o: unknown, m?: string) => console.error(JSON.stringify({ level: "error", msg: m, ...(typeof o === "object" ? o : { v: o }) })),
|
||||||
|
debug: noop,
|
||||||
|
trace: noop,
|
||||||
|
fatal: (o: unknown, m?: string) => console.error(JSON.stringify({ level: "fatal", msg: m, ...(typeof o === "object" ? o : { v: o }) })),
|
||||||
|
child: () => build(),
|
||||||
|
} as unknown as pino.Logger;
|
||||||
|
}
|
||||||
|
return pino({
|
||||||
|
level: process.env.LOG_LEVEL || "info",
|
||||||
|
base: {
|
||||||
|
app: "disclosure-web",
|
||||||
|
env: process.env.NODE_ENV || "development",
|
||||||
|
},
|
||||||
|
timestamp: pino.stdTimeFunctions.isoTime,
|
||||||
|
// Production: NDJSON (one JSON per line). Dev: pretty-printed.
|
||||||
|
transport: process.env.NODE_ENV === "production" ? undefined : {
|
||||||
|
target: "pino-pretty",
|
||||||
|
options: { colorize: true, translateTime: "SYS:HH:MM:ss.l" },
|
||||||
|
},
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
export const log: pino.Logger = build();
|
||||||
|
|
||||||
|
/** Create a child logger bound to a request's correlation id. */
|
||||||
|
export function withRequest(req: Request | { headers: Headers }): pino.Logger {
|
||||||
|
const id = req.headers.get("x-correlation-id") ||
|
||||||
|
req.headers.get("x-request-id") ||
|
||||||
|
cryptoRandomId();
|
||||||
|
return log.child({ correlation_id: id });
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Get-or-mint a correlation id for a request. */
|
||||||
|
export function correlationId(req: Request | { headers: Headers }): string {
|
||||||
|
return req.headers.get("x-correlation-id") ||
|
||||||
|
req.headers.get("x-request-id") ||
|
||||||
|
cryptoRandomId();
|
||||||
|
}
|
||||||
|
|
||||||
|
function cryptoRandomId(): string {
|
||||||
|
// 16 hex chars — short enough for logs, enough entropy for non-security uses.
|
||||||
|
// Both edge runtime and Node 19+ expose globalThis.crypto; older Node falls
|
||||||
|
// back to Math.random (acceptable: this is correlation, not security).
|
||||||
|
const g = globalThis as { crypto?: { getRandomValues?: (a: Uint8Array) => void } };
|
||||||
|
if (g.crypto?.getRandomValues) {
|
||||||
|
const buf = new Uint8Array(8);
|
||||||
|
g.crypto.getRandomValues(buf);
|
||||||
|
return Array.from(buf, (b) => b.toString(16).padStart(2, "0")).join("");
|
||||||
|
}
|
||||||
|
return Math.random().toString(36).slice(2, 18);
|
||||||
|
}
|
||||||
|
|
@ -6,12 +6,17 @@
|
||||||
*/
|
*/
|
||||||
import { NextResponse, type NextRequest } from "next/server";
|
import { NextResponse, type NextRequest } from "next/server";
|
||||||
import { createServerClient, type CookieOptions } from "@supabase/ssr";
|
import { createServerClient, type CookieOptions } from "@supabase/ssr";
|
||||||
|
import { log, correlationId } from "@/lib/logger";
|
||||||
|
|
||||||
export async function middleware(request: NextRequest) {
|
export async function middleware(request: NextRequest) {
|
||||||
|
const t0 = Date.now();
|
||||||
const url = process.env.NEXT_PUBLIC_SUPABASE_URL;
|
const url = process.env.NEXT_PUBLIC_SUPABASE_URL;
|
||||||
const key = process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY;
|
const key = process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY;
|
||||||
|
const reqId = correlationId(request);
|
||||||
|
|
||||||
let response = NextResponse.next({ request });
|
let response = NextResponse.next({ request });
|
||||||
|
// Stamp every response so downstream handlers and the client see the same id.
|
||||||
|
response.headers.set("x-correlation-id", reqId);
|
||||||
|
|
||||||
if (!url || !key) {
|
if (!url || !key) {
|
||||||
// Supabase not configured — skip auth refresh entirely
|
// Supabase not configured — skip auth refresh entirely
|
||||||
|
|
@ -34,10 +39,11 @@ export async function middleware(request: NextRequest) {
|
||||||
// Trigger refresh (silently if token still valid)
|
// Trigger refresh (silently if token still valid)
|
||||||
const { data: { user } } = await supabase.auth.getUser();
|
const { data: { user } } = await supabase.auth.getUser();
|
||||||
|
|
||||||
// Gate /admin/* by role. Non-admin (including anonymous) gets the public
|
// Gate /admin/* AND /api/admin/* by role. Non-admin (including anonymous)
|
||||||
// 404, not a redirect — we don't want to leak the existence of the route.
|
// gets a public 404, not a redirect — we don't want to leak the existence
|
||||||
|
// of the route. (Audit W0-F1 — fechado 2026-05-23.)
|
||||||
const pathname = request.nextUrl.pathname;
|
const pathname = request.nextUrl.pathname;
|
||||||
if (pathname.startsWith("/admin")) {
|
if (pathname.startsWith("/admin") || pathname.startsWith("/api/admin")) {
|
||||||
if (!user) {
|
if (!user) {
|
||||||
return new NextResponse("Not Found", { status: 404 });
|
return new NextResponse("Not Found", { status: 404 });
|
||||||
}
|
}
|
||||||
|
|
@ -51,6 +57,22 @@ export async function middleware(request: NextRequest) {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Log API requests with correlation id + timing. Skip noisy paths (assets,
|
||||||
|
// crops) and prefer one structured line per request so Glitchtip / log
|
||||||
|
// aggregators can correlate.
|
||||||
|
if (pathname.startsWith("/api/") && !pathname.startsWith("/api/static") && !pathname.startsWith("/api/crop")) {
|
||||||
|
log.info(
|
||||||
|
{
|
||||||
|
event: "http_request",
|
||||||
|
method: request.method,
|
||||||
|
path: pathname,
|
||||||
|
correlation_id: reqId,
|
||||||
|
duration_ms: Date.now() - t0,
|
||||||
|
},
|
||||||
|
`${request.method} ${pathname}`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
return response;
|
return response;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
2363
web/package-lock.json
generated
2363
web/package-lock.json
generated
File diff suppressed because it is too large
Load diff
|
|
@ -15,6 +15,7 @@
|
||||||
"@radix-ui/react-tooltip": "^1.1.0",
|
"@radix-ui/react-tooltip": "^1.1.0",
|
||||||
"@react-sigma/core": "^5.0.0",
|
"@react-sigma/core": "^5.0.0",
|
||||||
"@react-sigma/layout-forceatlas2": "^5.0.0",
|
"@react-sigma/layout-forceatlas2": "^5.0.0",
|
||||||
|
"@sentry/nextjs": "^10.53.1",
|
||||||
"@supabase/ssr": "^0.10.3",
|
"@supabase/ssr": "^0.10.3",
|
||||||
"@supabase/supabase-js": "^2.105.4",
|
"@supabase/supabase-js": "^2.105.4",
|
||||||
"framer-motion": "^11.11.0",
|
"framer-motion": "^11.11.0",
|
||||||
|
|
@ -24,6 +25,7 @@
|
||||||
"lucide-react": "^0.460.0",
|
"lucide-react": "^0.460.0",
|
||||||
"next": "^15.1.0",
|
"next": "^15.1.0",
|
||||||
"pg": "^8.13.1",
|
"pg": "^8.13.1",
|
||||||
|
"pino": "^10.3.1",
|
||||||
"react": "^19.0.0",
|
"react": "^19.0.0",
|
||||||
"react-dom": "^19.0.0",
|
"react-dom": "^19.0.0",
|
||||||
"react-force-graph-2d": "^1.27.0",
|
"react-force-graph-2d": "^1.27.0",
|
||||||
|
|
|
||||||
17
web/sentry.client.config.ts
Normal file
17
web/sentry.client.config.ts
Normal file
|
|
@ -0,0 +1,17 @@
|
||||||
|
/**
|
||||||
|
* Sentry (Glitchtip-compatible) client-side init. Loaded by Next.js
|
||||||
|
* automatically when @sentry/nextjs is installed.
|
||||||
|
*/
|
||||||
|
import * as Sentry from "@sentry/nextjs";
|
||||||
|
|
||||||
|
const dsn = process.env.NEXT_PUBLIC_SENTRY_DSN;
|
||||||
|
if (dsn) {
|
||||||
|
Sentry.init({
|
||||||
|
dsn,
|
||||||
|
environment: process.env.NODE_ENV || "development",
|
||||||
|
tracesSampleRate: 0,
|
||||||
|
sendDefaultPii: false,
|
||||||
|
// Capture unhandled promise rejections + JS errors. Glitchtip community
|
||||||
|
// ignores everything below `error` severity by default.
|
||||||
|
});
|
||||||
|
}
|
||||||
21
web/sentry.server.config.ts
Normal file
21
web/sentry.server.config.ts
Normal file
|
|
@ -0,0 +1,21 @@
|
||||||
|
/**
|
||||||
|
* Sentry (Glitchtip-compatible) server-side init.
|
||||||
|
*
|
||||||
|
* DSN must point to Glitchtip — we never send to sentry.io. See
|
||||||
|
* SENTRY_DSN / NEXT_PUBLIC_SENTRY_DSN in docker-compose.yml. If unset, the SDK
|
||||||
|
* is loaded but no events ship — safe for local dev.
|
||||||
|
*/
|
||||||
|
import * as Sentry from "@sentry/nextjs";
|
||||||
|
|
||||||
|
const dsn = process.env.SENTRY_DSN || process.env.NEXT_PUBLIC_SENTRY_DSN;
|
||||||
|
if (dsn) {
|
||||||
|
Sentry.init({
|
||||||
|
dsn,
|
||||||
|
environment: process.env.NODE_ENV || "development",
|
||||||
|
release: process.env.SENTRY_RELEASE,
|
||||||
|
tracesSampleRate: 0, // Glitchtip community doesn't support performance traces
|
||||||
|
sendDefaultPii: false,
|
||||||
|
// Make sure events land on Glitchtip's tunnel-friendly DSN host, not
|
||||||
|
// sentry.io. The SDK already infers from DSN; this is just defensive.
|
||||||
|
});
|
||||||
|
}
|
||||||
Loading…
Reference in a new issue