Commit graph

3 commits

Author SHA1 Message Date
Luiz Gustavo
fe19bb9c57 add page↔document navigation + DB repopulation tooling
Doc page (/d/[docId]/[page]) gains prev/next navigation bars (top + bottom):
within a doc it steps page-by-page; at the first/last page it jumps to the
previous/next document. Replaces the disabled-at-boundary links.

Indexer tooling for the VPS repopulation:
- 30-index-chunks-to-db.py: add --no-embed (fast BM25-only index; vectors
  backfilled separately) so the app is usable in minutes, not hours of CPU
  embedding.
- 57_load_relations_from_json.py: load typed relations into public.relations
  from reextract structured fields (deterministic ids, no fuzzy guessing).
- 58_backfill_embeddings.py: async pass to fill chunks.embedding (NULL rows)
  via the embed-service.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 14:28:14 -03:00
Luiz Gustavo
a7e9dce6d2 rebuild entity layer from Sonnet-vision reextract pipeline
Add reextract pipeline (scripts/reextract/) that rebuilds doc-level entity
JSON from Sonnet-vision chunks via Opus, replacing the noisy per-page
extraction. Add synthesize scripts to regenerate wiki/entities from the 116
_reextract.json (30), aggregate missing page.md from chunks (31), and reprocess
805 pages the doc-rebuilder agent dropped on context overflow (32). Add
maintain scripts 43-56 for chunk-page sync, dedup, generic-entity marking, and
typed relation extraction.

Web: wire relations API + entity-relations component; entity/timeline/doc
pages consume the rebuilt layer.

Note: raw/, processing/, wiki/ remain gitignored (bulk data managed
separately); the 116 reextract JSONs and 7,798 rebuilt entity files live on
disk only. The 27 curated anchor events under wiki/entities/events/ are
preserved.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 12:20:24 -03:00
guto
19d0678e55 baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00