Add reextract pipeline (scripts/reextract/) that rebuilds doc-level entity JSON from Sonnet-vision chunks via Opus, replacing the noisy per-page extraction. Add synthesize scripts to regenerate wiki/entities from the 116 _reextract.json (30), aggregate missing page.md from chunks (31), and reprocess 805 pages the doc-rebuilder agent dropped on context overflow (32). Add maintain scripts 43-56 for chunk-page sync, dedup, generic-entity marking, and typed relation extraction. Web: wire relations API + entity-relations component; entity/timeline/doc pages consume the rebuilt layer. Note: raw/, processing/, wiki/ remain gitignored (bulk data managed separately); the 116 reextract JSONs and 7,798 rebuilt entity files live on disk only. The 27 curated anchor events under wiki/entities/events/ are preserved. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
10 lines
396 B
PL/PgSQL
10 lines
396 B
PL/PgSQL
-- 53_add_is_generic_to_db.sql
|
|
-- Add public.entities.is_generic BOOLEAN. Populated by 54_sync_is_generic.py
|
|
-- which reads each YAML's is_generic and writes it to the DB.
|
|
|
|
BEGIN;
|
|
ALTER TABLE public.entities
|
|
ADD COLUMN IF NOT EXISTS is_generic BOOLEAN NOT NULL DEFAULT FALSE;
|
|
CREATE INDEX IF NOT EXISTS entities_is_generic_idx
|
|
ON public.entities (is_generic) WHERE is_generic = TRUE;
|
|
COMMIT;
|