disclosure-bureau/scripts/maintain
Luiz Gustavo a7e9dce6d2 rebuild entity layer from Sonnet-vision reextract pipeline
Add reextract pipeline (scripts/reextract/) that rebuilds doc-level entity
JSON from Sonnet-vision chunks via Opus, replacing the noisy per-page
extraction. Add synthesize scripts to regenerate wiki/entities from the 116
_reextract.json (30), aggregate missing page.md from chunks (31), and reprocess
805 pages the doc-rebuilder agent dropped on context overflow (32). Add
maintain scripts 43-56 for chunk-page sync, dedup, generic-entity marking, and
typed relation extraction.

Web: wire relations API + entity-relations component; entity/timeline/doc
pages consume the rebuilt layer.

Note: raw/, processing/, wiki/ remain gitignored (bulk data managed
separately); the 116 reextract JSONs and 7,798 rebuilt entity files live on
disk only. The 27 curated anchor events under wiki/entities/events/ are
preserved.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 12:20:24 -03:00
..
41_strip_stubs.py phase-0: kill stubs, ship 20 curated anchor events, configure SMTP 2026-05-18 00:44:17 -03:00
42_sync_entity_stats.py rebuild entity layer from Sonnet-vision reextract pipeline 2026-05-21 12:20:24 -03:00
43_fix_chunk_page_from_source_png.py rebuild entity layer from Sonnet-vision reextract pipeline 2026-05-21 12:20:24 -03:00
44_sync_chunk_page_to_db.py rebuild entity layer from Sonnet-vision reextract pipeline 2026-05-21 12:20:24 -03:00
45_resync_index_json.py rebuild entity layer from Sonnet-vision reextract pipeline 2026-05-21 12:20:24 -03:00
46_text_backfill_mentions.py rebuild entity layer from Sonnet-vision reextract pipeline 2026-05-21 12:20:24 -03:00
47_mark_unsearchable_chunks.sql rebuild entity layer from Sonnet-vision reextract pipeline 2026-05-21 12:20:24 -03:00
48_hybrid_search_filter_unsearchable.sql rebuild entity layer from Sonnet-vision reextract pipeline 2026-05-21 12:20:24 -03:00
49_dedup_aggressive.py rebuild entity layer from Sonnet-vision reextract pipeline 2026-05-21 12:20:24 -03:00
50_dedup_fuzzy_trigram.py rebuild entity layer from Sonnet-vision reextract pipeline 2026-05-21 12:20:24 -03:00
51_remap_entity_mentions.py rebuild entity layer from Sonnet-vision reextract pipeline 2026-05-21 12:20:24 -03:00
52_mark_generic_entities.py rebuild entity layer from Sonnet-vision reextract pipeline 2026-05-21 12:20:24 -03:00
53_add_is_generic_to_db.sql rebuild entity layer from Sonnet-vision reextract pipeline 2026-05-21 12:20:24 -03:00
54_sync_is_generic.py rebuild entity layer from Sonnet-vision reextract pipeline 2026-05-21 12:20:24 -03:00
55_relations_schema.sql rebuild entity layer from Sonnet-vision reextract pipeline 2026-05-21 12:20:24 -03:00
56_extract_relations.py rebuild entity layer from Sonnet-vision reextract pipeline 2026-05-21 12:20:24 -03:00
run_full_dedup_pipeline.sh rebuild entity layer from Sonnet-vision reextract pipeline 2026-05-21 12:20:24 -03:00