Commit graph

6 commits

Author SHA1 Message Date
Luiz Gustavo
d4a2e4f51e W3.10: clickable detective tiles + quick-launch form + doc bureau panel
Some checks failed
CI / Web — typecheck + lint + build (push) Failing after 37s
CI / Scripts — Python smoke (push) Failing after 5s
CI / Web — npm audit (push) Failing after 40s
CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 4s
Builds on top of W3.9 to turn the homepage Bureau from a read-only
dashboard into a working command center.

UI improvements (web/components/bureau-snapshot.tsx):
  - Detective tiles are now <Link>s — each navigates to its primary
    artefact section in /bureau (Holmes→#hypotheses, Locard→#evidence,
    Dupin→#contradictions, Schneier→#hypotheses, Poirot→#witnesses,
    Taleb→#outliers, Tetlock→#hypotheses, Case-Writer→#reports). Hover
    bg matches the detective's tone color.
  - <QuickLaunch /> form inserted right under the tiles.

New <QuickLaunch /> client component:
  - Detective dropdown (7 active kinds; evidence_chain not yet exposed
    here since it needs a doc_id better picked from the doc page).
  - Single input swaps placeholder + aria-label by kind: question for
    Holmes, topic for Dupin/Taleb/Case-Writer, hypothesis_id for
    Schneier/Tetlock, person_id for Poirot.
  - Submits to POST /api/bureau/launch and redirects to /jobs/[id]
    via the next.js router.
  - Loading state ("queueing…") + error display inline.

POST /api/bureau/launch (web/app/api/bureau/launch/route.ts):
  - Same 8-kind validator as the chat tool's request_investigation.
  - Auth required when Supabase is configured (triggered_by = user:email).
  - Returns { job_id, kind, detective, status_url, eta_seconds }.

DocBureauPanel on /d/[docId] (web/components/doc-bureau-panel.tsx):
  - Server component inserted between the doc header and
    AnomalyHighlights.
  - Surfaces every bureau artefact that touches the doc:
    · Evidence whose source_page_id starts with docId/p
    · Hypotheses citing any of those evidence_ids
    · Contradictions whose chunks[] has any item with this doc_id
    · Gaps/outliers with scope.doc_id == docId
    · Case reports whose markdown body references docId (filesystem scan)
  - Empty state shows "Investigation Bureau — untouched" with a CTA
    linking back to the homepage to launch the first investigation.
  - When non-empty, header counts total artefacts + links to /bureau
    for the full view.

Metadata (web/app/layout.tsx):
  - description rewritten from "Investigative wiki of the US Department
    of War UAP/UFO archive (war.gov/ufo)" to one that names the bureau
    + the 8 detectives. Affects SERP previews + social-card defaults.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 23:33:00 -03:00
Luiz Gustavo
55cac8a395 W0+W1+W1.2: security hardening, observability, autocomplete, glitchtip, forgejo CI
Some checks failed
CI / Web — typecheck + lint + build (push) Failing after 1m30s
CI / Scripts — Python smoke (push) Failing after 32s
CI / Web — npm audit (push) Failing after 37s
W0 — security hardening (5 fixes verified live on disclosure.top)
- middleware: gate /api/admin/* same as /admin/* (F1)
- imgproxy: tighten LOCAL_FILESYSTEM_ROOT from / to /var/lib/storage (F2)
- studio: real basic-auth label (bcrypt hash, middleware reference) (F3)
- relations: ENABLE ROW LEVEL SECURITY + public SELECT policy (F4)
- migration 0003: fold is_searchable + hybrid_search update into canonical (TD#2)

W1 — observability + resilience + autocomplete
- studio: HOSTNAME=0.0.0.0 so Next.js binds on loopback for healthcheck
- compose: PG_POOL_MAX=20, CLAUDE_CODE_OAUTH_TOKEN gated by separate env
- claude-code.ts: subprocess timeout configurable (CLAUDE_CODE_TIMEOUT_MS)
- openrouter.ts: retry with exponential backoff + Retry-After + in-memory
  circuit breaker (promotes FALLBACK after CB_THRESHOLD failures)
- lib/logger.ts: pino logger (NDJSON prod / pretty dev) + withRequest helper
- middleware: mints correlation_id, stamps x-correlation-id response header,
  emits structured http_request log per /api/* call
- messages/route.ts: switch to structured logger
- 60_meili_index.py: push documents + chunks into Meilisearch
- /api/search/autocomplete: parallel meili search (docs + chunks), 5-8ms p50
- search-autocomplete.tsx: debounced dropdown wired into search-panel

W1.2 — Glitchtip + Forgejo self-hosted
- compose: glitchtip-redis + glitchtip-web + glitchtip-worker (v4.2)
- compose: forgejo + forgejo-runner (server v9, runner v6) with group_add=988
- @sentry/nextjs SDK wired (instrumentation.ts + sentry.{client,server}.config.ts)
- /api/admin/throw smoke endpoint (gated by W0-F1 middleware)
- Synthetic event ingestion verified at glitchtip.disclosure.top
- forgejo.disclosure.top up, repo discadmin/disclosure-bureau created,
  runner registered (labels: ubuntu-latest, docker)
- .forgejo/workflows/ci.yml: typecheck + lint + build + npm audit + python
  syntax + compose validation

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:18:42 -03:00
Luiz Gustavo
e75ca5eda2 add clean LLM reading version of documents (the core goal)
Scanned docs are messy — duplicate transcriptions (typed + handwritten),
two classification variants of the same narrative, OCR noise, repeated
banners. The doc page showed raw chunks, so everything appeared twice.

40_reading_version.py generates ONE clean, deduplicated, well-structured
bilingual Markdown reading version per doc (Sonnet): merges duplicate versions
without losing unique lines, drops page furniture, formats transcripts as
dialogue. Faithful — invents nothing; redactions kept as markers.

/d/[docId] now defaults to a "📖 leitura" tab rendering this clean version,
with "🔍 trechos · scan original" preserving the faithful per-chunk + per-page
scan view. reading.md lives in raw/<doc>--subagent/ alongside the chunks.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 17:23:36 -03:00
Luiz Gustavo
fe19bb9c57 add page↔document navigation + DB repopulation tooling
Doc page (/d/[docId]/[page]) gains prev/next navigation bars (top + bottom):
within a doc it steps page-by-page; at the first/last page it jumps to the
previous/next document. Replaces the disabled-at-boundary links.

Indexer tooling for the VPS repopulation:
- 30-index-chunks-to-db.py: add --no-embed (fast BM25-only index; vectors
  backfilled separately) so the app is usable in minutes, not hours of CPU
  embedding.
- 57_load_relations_from_json.py: load typed relations into public.relations
  from reextract structured fields (deterministic ids, no fuzzy guessing).
- 58_backfill_embeddings.py: async pass to fill chunks.embedding (NULL rows)
  via the embed-service.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 14:28:14 -03:00
Luiz Gustavo
a7e9dce6d2 rebuild entity layer from Sonnet-vision reextract pipeline
Add reextract pipeline (scripts/reextract/) that rebuilds doc-level entity
JSON from Sonnet-vision chunks via Opus, replacing the noisy per-page
extraction. Add synthesize scripts to regenerate wiki/entities from the 116
_reextract.json (30), aggregate missing page.md from chunks (31), and reprocess
805 pages the doc-rebuilder agent dropped on context overflow (32). Add
maintain scripts 43-56 for chunk-page sync, dedup, generic-entity marking, and
typed relation extraction.

Web: wire relations API + entity-relations component; entity/timeline/doc
pages consume the rebuilt layer.

Note: raw/, processing/, wiki/ remain gitignored (bulk data managed
separately); the 116 reextract JSONs and 7,798 rebuilt entity files live on
disk only. The 27 curated anchor events under wiki/entities/events/ are
preserved.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 12:20:24 -03:00
guto
19d0678e55 baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00