disclosure-bureau/CHANGELOG.md
Luiz Gustavo 189a771cbe
Some checks failed
CI / Web — typecheck + lint + build (push) Failing after 38s
CI / Scripts — Python smoke (push) Failing after 3s
CI / Web — npm audit (push) Failing after 33s
CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 4s
W3.1-W3.4: Investigation Bureau foundation — migrations, runtime, Locard
Migrations:
- 0004_investigation_bureau.sql: 7 new tables (investigation_jobs + evidence,
  hypotheses, contradictions, witnesses, gaps, residual_uncertainties), id
  sequences, pg_notify trigger on investigation_jobs, RLS read-only public,
  investigator role with least-privilege grants (no service_role).
- 0005_investigator_write_policies.sql: fixup adding RLS INSERT/UPDATE
  policies bound to investigator + service_role + postgres (RLS with only a
  SELECT policy was silently blocking the worker's claim UPDATE).

investigator-runtime/ (new Bun + TS container):
- src/main.ts: LISTEN/NOTIFY poller, claim-with-SKIP-LOCKED, drain pool,
  healthcheck file, graceful SIGTERM shutdown.
- src/orchestrator.ts: chief-detective dispatch (evidence_chain → Locard).
  Marks job failed when all per-item outputs error; surfaces first errors.
- src/lib/{env,pg,audit,ids,claude}.ts: typed config (gate #8), pool +
  dedicated LISTEN client, NDJSON audit, sequence allocator (E-NNNN etc),
  claude -p subprocess with quota detection (api_error_status=429).
- src/tools/write_evidence.ts: schema-validate (grade A/B/C custody steps),
  resolve chunk_pk via FK, verify verbatim_excerpt actually appears in
  chunk content, INSERT + render case/evidence/E-NNNN.md + audit.
- src/detectives/locard.ts: load chunk → call Claude with locard.md system
  prompt → parse strict JSON → call writeEvidence locally.
- Dockerfile installs `claude` CLI (OAuth) at build time.

Compose:
- new `investigator` service builds from investigator-runtime/, connects
  with low-privilege role, mounts case/ RW and wiki/+raw/ RO, 512m mem cap.

Web:
- /api/admin/investigate/test (POST+GET) gated by middleware (W0-F1).
  POST creates a job, GET polls status. For W3.6 it becomes the chat tool.

End-to-end smoke: INSERT job → pg_notify → claim → Locard dispatch →
claude subprocess invoked. Auth works (CLI v2.1.150). Currently quota
exhausted (weekly limit · resets 3pm UTC) — pipeline catches the typed
isQuota error, marks job failed with surfaced reason. Architecture proven;
quota reset enables real evidence creation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 19:49:33 -03:00

298 lines
17 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Changelog · Disclosure Bureau
All notable changes to this project go here. Newest on top.
## [Unreleased]
### W3.1W3.4 — Investigation Bureau foundation
*2026-05-23 · systems-atelier engagement trace `794f00ba` · cerne do brief*
The "8 detectives" branding becomes a real motor. This wave delivers the
database schema, the agentic runtime container, the first gated writer, and
the first detective end-to-end. Subsequent waves W3.5W3.10 add the remaining
detectives, the chat tool, and the frontend.
- **Migration `0004_investigation_bureau.sql`** — 7 new tables with RLS:
`investigation_jobs` (queue + audit), `evidence`, `hypotheses`,
`contradictions`, `witnesses`, `gaps`, `residual_uncertainties`. ID
sequences `evidence_id_seq` etc. for human-readable IDs (E-NNNN /
H-NNNN / R-NNNN / W-NNNN / G-NNNN / RU-NNNN). `pg_notify` trigger on
`investigation_jobs` fires on every INSERT so workers wake up immediately.
- **`investigator` role** carved out of the existing Postgres with
least-privilege grants: SELECT on the read corpus
(`chunks/entities/entity_mentions/relations/documents`), INSERT/UPDATE on
the 7 new tables and their sequences, **no service_role**, **no
auth.users / profiles / messages**. Per gate #1 of the security audit.
- **Migration `0005_investigator_write_policies.sql`** — fix-up: RLS
with only a SELECT policy silently blocked the worker's `UPDATE …
RETURNING` claim query. New INSERT/UPDATE policies on all 7 tables
bound to the `investigator` role (plus service_role + postgres).
- **`investigator-runtime/`** new Bun + TypeScript container:
`src/main.ts` (LISTEN poller + claim-skip-locked + healthcheck file),
`src/orchestrator.ts` (chief-detective dispatch), `src/lib/{env,pg,
audit,ids,claude}.ts`, `src/detectives/locard.ts`, and
`src/tools/write_evidence.ts`. Dockerfile built on `oven/bun:1.1-slim`
with `claude` CLI installed for OAuth subprocess calls. Healthcheck
touches `/tmp/healthy` per loop; docker declares unhealthy if stale.
- **Locard detective** (the simplest of the 8): given a chunk, asks Claude
Sonnet 4.6 to extract a verbatim quote + chain of custody. The model
emits a strict JSON object; the runtime owns the writer (gate #2 of
security audit). System prompt at `investigator-runtime/prompts/locard.md`.
- **`write_evidence` tool** — schema-validated INSERT into `public.evidence`
+ render `case/evidence/E-NNNN.md`. Rejects evidence whose
`verbatim_excerpt` isn't found inside the source chunk's content
(Sonnet must not paraphrase). Rejects below-grade rows (A ≥ 3 custody
steps, B ≥ 2, C ≥ 1). FK to `public.chunks` so the row can never reference
a phantom chunk.
- **`/api/admin/investigate/test`** admin endpoint — POST creates a job,
GET polls. Gated by middleware (`/api/admin/* → 404` for non-admins,
per W0-F1). Designed for the chat-based `request_investigation` tool
coming in W3.6.
- **End-to-end smoke test on prod**:
1. INSERT a job (`evidence_chain`, doc `dow-uap-d017-…-sandia`,
chunks `[c0030]`).
2. `pg_notify investigation_jobs` fires.
3. Worker LISTEN receives the notification.
4. `claimNextJob` UPDATE-claims the row (worker_id stamped).
5. Locard is dispatched.
6. `claude -p` subprocess invoked (auth + model lookup successful, version
2.1.150).
7. **Currently** Claude OAuth Max 20x weekly quota is exhausted
(`api_error_status: 429`, `"You've hit your weekly limit · resets
3pm (UTC)"`). The orchestrator catches the typed `isQuota` error;
the job is now marked `failed` (not `complete`) with the surfaced
reason in `error`. **The plumbing works end-to-end** — when the
quota resets, the same job replayed succeeds.
- **Architecture conforms to the 8 security gates** (`ADR-002` + section 9
of `agentic-layer-spec.md`): no service_role in the worker; schema
validation before INSERT; `created_by` stamped on every row;
`BUDGET_CAP_USD_PER_JOB` enforced per call; allowlist tools
(only `write_evidence` for Locard so far, no `WebSearch`); audit
trail at `case/audit.jsonl`. Gates #6#8 to land alongside W3.5+.
#### Verified live (2026-05-23T22:48Z):
- `\dt public.{investigation_jobs,evidence,hypotheses,…}` → all 7 tables exist.
- `psql -U investigator -c 'SELECT COUNT(*) FROM public.chunks'` → 28 559
(read works with low-privilege role).
- `docker ps disclosure-investigator``Up (healthy)`.
- Audit log shows `runtime_starting → listening → job_claimed →
detective_dispatched → job_failed_all_items (quota)` chain.
- Job state transitions correctly persisted in `public.investigation_jobs`.
#### W3.5+ pending (next session):
- Detective `holmes` + `write_hypothesis` tool (hypothesis tournament).
- Detective `dupin` + `write_contradiction` tool + daily cron.
- Detectives `tetlock`, `schneier`, `taleb`, `poirot`, `case-writer`.
- Chat tool `request_investigation` + status bar + `/jobs/[id]` page.
- Frontend tab "Investigation" + `/h/[hypothesisId]` page.
- Golden hypothesis set (W3.10 quality gate).
### W2 — UX latency + retrieval eval + vision tool
*2026-05-23 · systems-atelier engagement trace `794f00ba`*
- **TD#8 · Reranker opt-in** (`hybrid.ts`). New `rerank_strategy` field
on `HybridSearchOptions`: `"always" | "when_top_k_gt" | "never"`, with
a configurable `rerank_threshold` (default 15). Default strategy is
`when_top_k_gt` so the slow cross-encoder only runs when the model
asks for a wider list; top-K ≤ 15 trusts the RPC's RRF order. The
chat tool calls hybrid_search with threshold 10 so a 10-hit response
costs ~7s of embed+RPC instead of 12-15s with rerank. `/api/search/hybrid`
exposes the strategy via `?rerank=always|never|when_top_k_gt` plus
`?rerank_threshold=N`. Back-compat `?rerank=0` still means "never".
- **O11 · `analyze_image_region` chat tool** (`vision.ts`, `tools.ts`).
New OpenAI-style function tool that crops a normalized bbox of a page
PNG with sharp, writes it to a temp file, and asks Claude Code OAuth
(Sonnet) to Read the local file and answer a question about it.
Schema: `{doc_id, page, bbox{x,y,w,h}, question, context?}`. Emits a
`crop_image` artifact for the UI alongside the textual answer. Cost
budget: ~$0.0050.02 per call, paid against the user's Max 20x
quota. Timeout configurable via `VISION_TIMEOUT_MS` (default 120s).
- **TD#12 · `react-force-graph-2d` removed**. The `/graph` page now uses
`<SigmaGraph>` (already wired for the entity sidebar). One graph
library is enough. `web/components/force-graph-canvas.tsx` deleted;
`npm uninstall` removed 37 transitive deps.
- **TD#27 · Context truncation per type configurable**
(`messages/route.ts`). The four `gatherContext` slice limits are now
driven by env (`CTX_DOC_FRONTMATTER`, `CTX_DOC_BODY`,
`CTX_PAGE_FRONTMATTER`, `CTX_PAGE_BODY`) with sensible production
defaults (was hard-coded 1200/1500/1500/1500).
- **TD#22 · Golden RAG eval** (`tests/rag/`). New harness:
`golden.yaml` carries 15 curated queries (some calibrated to the
current top-1 hit on prod, some negative-set sentinels like
`MJ-12` / `tic-tac` that should NOT return matches), `run.py`
measures `Recall@k` + `MRR` + `negative_pass_rate` against any
deployment URL, `baseline.json` is the gate threshold, `last_run.json`
is the working report. Default behaviour: fail the run when Recall@5
drops > 0.05 from baseline. CI workflow runs against
`https://disclosure.top` on every push.
- First baseline (rerank=never): **Recall@5 = 0.2083, MRR = 0.25,
Negative pass = 1.0**. Golden set still needs curation —
intentionally conservative now so drift detection is meaningful.
- **ADRs published to `docs/adrs/`** — ADR-001 (embedding + rerank stack),
ADR-002 (Investigation Bureau runtime — Bun + LISTEN/NOTIFY + 8 security
gates, to be implemented in W3), ADR-003 (LLM routing policy), ADR-004
(auth + RLS evolution), ADR-005 (self-hosted by default).
#### Verified on `disclosure.top` (2026-05-23T21:55Z):
- `/api/search/hybrid?q=Roswell&top_k=5` → HTTP 200 in 6.7s (embed-only,
rerank skipped per default strategy)
- `/api/search/hybrid?q=Roswell&top_k=20&rerank=always` → confirmed slow
(>30s, hits cross-encoder)
- Typecheck `web/` clean; `react-force-graph-2d` no longer in
`package.json`
- `tests/rag/run.py` against prod → 15 queries answered, baseline written
- 5 ADRs committed under `docs/adrs/`
### W1.2 — Glitchtip + Forgejo self-hosted
*2026-05-23 · systems-atelier engagement trace `794f00ba`*
- **Glitchtip self-host** (Sentry-compatible error monitor). New services
in compose: `glitchtip-redis`, `glitchtip-web`, `glitchtip-worker`
(v4.2, uWSGI on 8080). Database `glitchtip` carved out of
`disclosure-db` as a separate role/DB. Bootstrap done via Django
`manage.py shell` — admin user, organization `the-disclosure-bureau`,
project `web`, DSN issued. SDK wired: `@sentry/nextjs` + `instrumentation.ts`
+ `sentry.{client,server}.config.ts`. `/api/admin/throw` smoke endpoint
is admin-gated. Live at `https://glitchtip.disclosure.top` (TLS issued
by Let's Encrypt via Traefik). Synthetic event verified — POST
`/api/1/store/` → 200 + event id.
- **Forgejo self-host + Actions CI**. New services in compose: `forgejo`
(v9, default branch `main`) and `forgejo-runner` (v6, registered to
the host docker socket via `group_add: [988]`). Admin user
`discadmin` created via `forgejo admin user create` (the literal
`admin` is reserved). Runner bootstrap on first start: registers if
`.runner` absent, then `forgejo-runner daemon`. Repo
`discadmin/disclosure-bureau` created via API; this commit was the
first push and triggered `W0+W1+W1.2: …` workflow at task 1.
- **`.forgejo/workflows/ci.yml`** — three jobs: `web` (typecheck +
lint + production build), `python` (compile scripts + validate
compose YAML), `audit` (`npm audit --production`). Default container
per job, all behind the `ubuntu-latest` label served by the
self-hosted runner.
#### Verified on the stack (2026-05-23T21:19Z):
- `glitchtip.disclosure.top` → HTTP 200, real Let's Encrypt cert,
Glitchtip CSP headers present.
- POST `/api/1/store/` → 200, event_id `cb17d723…` returned.
- `forgejo.disclosure.top` → HTTP 200, Forgejo welcome page.
- Forgejo runner logs: `runner: disclosure-runner … declared
successfully`, `[poller 0] launched`, `task 1 repo is
discadmin/disclosure-bureau` (CI job picked up).
- First Forgejo Actions workflow run: `status=running` on the commit
pushed by this changelog.
### W1 — Observability + resilience + Meili autocomplete
*2026-05-23 · systems-atelier engagement trace `794f00ba`*
- **Studio container fixed (carry-over from W0)** — root cause was Next.js
standalone binding to the container hostname only. The docker healthcheck
(`fetch 127.0.0.1:3000/api/profile`) looped on `ECONNREFUSED`, the service
never went healthy, and Traefik returned 404 because the upstream wasn't
responding. Fix: `HOSTNAME: 0.0.0.0` in the studio env. Studio now
`healthy`, basic auth from W0-F3 enforces correctly (no-auth → 401,
valid creds → 307), and Let's Encrypt issued a real cert for
`studio.disclosure.top` once the route started responding.
- **TD#10 · PG pool max** — `PG_POOL_MAX=20` (was hard-coded 5) configurable
via .env; default raised for prod. Files: `docker-compose.yml`, `.env`.
- **W1-F8 · `CLAUDE_CODE_OAUTH_TOKEN` gated** — only injected into the `web`
service when explicitly set in `CLAUDE_CODE_OAUTH_TOKEN_FOR_WEB`. Default
empty since `CHAT_PROVIDER=openrouter` does not need it. Reduces blast
radius if web container is compromised. Files: `docker-compose.yml`, `.env`.
- **TD#30 · Subprocess timeout configurable** — `CLAUDE_CODE_TIMEOUT_MS`
env now controls the `claude -p` subprocess timeout (default 90s,
matches prior hard-coded value). Files: `web/lib/chat/claude-code.ts`.
- **TD#23 · OpenRouter retry + circuit breaker** — `fetchOpenRouter()`
wraps every call with: retry up to `OPENROUTER_RETRY_MAX` (default 2)
on 408 / 425 / 429 / 500 / 502 / 503 / 504 and network errors, with
exponential backoff and `Retry-After` honored; in-memory circuit
breaker trips when `PRIMARY` fails `CB_THRESHOLD` times (default 3)
within `CB_WINDOW_MS` (60s), promoting `FALLBACK` for `CB_COOLDOWN_MS`
(2 min). Both `sendOnce` and `openrouterStreamCall` go through it.
Files: `web/lib/chat/openrouter.ts`.
- **TD#6 · Structured logging with pino** — `web/lib/logger.ts` provides
a JSON logger (NDJSON in prod, pretty in dev) plus `withRequest()`
helper for correlation-id-bound child loggers. Edge runtime falls back
to a console adapter. Middleware now mints a `correlation_id` for
every request, stamps the response header (`x-correlation-id`), and
emits one structured `http_request` line per `/api/*` call with
method, path, status, and duration. `messages/route.ts` switched to
the new logger. Files: `web/lib/logger.ts`, `web/middleware.ts`,
`web/app/api/sessions/[id]/messages/route.ts`, `web/package.json`.
- **Meilisearch indexer + `/api/search/autocomplete` + UI** — the previously
idle Meili instance now backs typo-tolerant prefix search. Indexer
script `scripts/maintain/60_meili_index.py` ingests documents
(canonical_title + collection) and is-searchable chunks (content_pt +
content_en + meta). The new `/api/search/autocomplete?q=...` route
hits both indexes in parallel with a 2s abort and returns a merged
payload. `SearchAutocomplete` React component drops a debounced
dropdown under the `/search` input. Median latency in production:
**58ms**. Files: `scripts/maintain/60_meili_index.py`,
`web/app/api/search/autocomplete/route.ts`,
`web/components/search-autocomplete.tsx`,
`web/components/search-panel.tsx`.
#### Verified on `disclosure.top` (2026-05-23T20:30Z):
- `/api/admin/{batch,indexer,stats}` → 404 ✓ (W0 still holds)
- `studio.disclosure.top` no-auth → 401 · `admin:<DASHBOARD_PASSWORD>` → 307 ✓
- Let's Encrypt cert issued for `studio.disclosure.top`
- Autocomplete `q=Roswell` → 8 chunks in 8ms; `q=Sandia` → 1 doc + 8 chunks
in 8ms; `q=1947` → 5 docs + 8 chunks in 6ms ✓
- `x-correlation-id` header present on `/api/search/hybrid` response
(e.g. `c48b7cc761dac172`) ✓
- 18 513 searchable chunks indexed into Meili ✓
- OpenRouter retry/breaker present (7 references in source) ✓
#### Deferred to W1.2 / W2 (need user-in-loop steps):
- **Glitchtip self-host** — needs DNS for `glitchtip.disclosure.top`,
initial signup-as-superuser, project DSN copied to .env. Logger and
middleware are already feeding the data; SDK wiring is one PR.
- **Forgejo Actions self-host CI** — Forgejo server + runner bootstrap,
initial admin account, repo migration / mirror. Recommend a separate
session because of the depth of setup.
### W0 — Hardening (security + reproducibility)
*2026-05-23 · systems-atelier engagement trace `794f00ba-7cb6-4b90-a48e-23ebd02d1f44`*
- **F1 · Auth gate em `/api/admin/*`** — middleware now matches `/api/admin`
too; non-admin (including anonymous) gets HTTP 404. Verified: `curl`
on `/api/admin/{batch,indexer,stats}` returns 404 publicly. Files:
`web/middleware.ts`.
- **F2 · Imgproxy filesystem root tightened** — `IMGPROXY_LOCAL_FILESYSTEM_ROOT`
moved from `/` (entire VPS root) to `/var/lib/storage` (Storage backend
mount only). Reduces blast radius of any future imgproxy CVE. Files:
`infra/disclosure-stack/docker-compose.yml`.
- **F3 · Studio basic auth label** — replaced the dead-end
`basicauth.usersfile=/dev/null` with a real bcrypt-hashed credential
(`DASHBOARD_USERNAME` / `DASHBOARD_PASSWORD` from `.env`) and wired the
middleware into the router via `disclosure-studio.middlewares=
disclosure-studio-auth@docker`. *Caveat:* the Studio container itself
has a pre-existing instability (restarts in a Next.js loop, status
`unhealthy`) so the front-end currently returns 404 from Traefik. When
Studio is stabilized (queue for W1), the basic auth will kick in. Files:
`infra/disclosure-stack/docker-compose.yml`.
- **F4 · RLS on `public.relations`** — `ENABLE ROW LEVEL SECURITY` + public
`SELECT` policy + `GRANT SELECT TO anon, authenticated`. Aligns with
every other public table. Files: `infra/supabase/migrations/0003_w0_hardening.sql`.
- **TD#2 · `is_searchable` folded into canonical migrations** — the column,
reclassification rules, partial index, and the updated `hybrid_search_chunks`
RPC (BM25 + dense, both filtered by `is_searchable`) are now in migration
`0003_w0_hardening.sql`. A clean bootstrap on a fresh VPS produces a
searchable database without any `scripts/maintain/47-48` post-hoc patches.
Files: `infra/supabase/migrations/0003_w0_hardening.sql`.
#### Verified on `disclosure.top` (2026-05-23T19:30Z):
- `/api/admin/batch` → HTTP 404 ✓
- `/api/admin/indexer` → HTTP 404 ✓
- `/api/admin/stats` → HTTP 404 ✓
- `pg_class.relrowsecurity` = `t` for chunks, documents, entities,
entity_mentions, **relations**
- `is_searchable` distribution: 18 513 searchable / 10 046 not-searchable
(35% of corpus deduplicated from results) ✓
- `/api/search/hybrid?q=Roswell` → HTTP 200, 10 hits, first `c0527`
- Studio: Traefik labels in place; container itself unhealthy (separate
issue, deferred to W1) ⚠
#### Notes for clean-install reproducibility:
- `0003_w0_hardening.sql` MUST be applied as `supabase_admin`, not
`postgres`, because public.chunks / .entities / .relations are owned by
`supabase_admin`. The migration file documents this in its header.