disclosure-bureau/infra/supabase/migrations/0008_entity_summaries.sql

28 lines
1.1 KiB
MySQL
Raw Normal View History

W5.3 (Phase 3A): entity summaries — sub-pages get magazine-grade prose Today /sightings, /witnesses, /objects, /locations and /operations show a name + mention count and nothing else. After this each row carries a 60-100 word bilingual narrative summary written from the chunks where the entity actually appears. Migration 0008 (apply as supabase_admin): public.entities +summary_en TEXT +summary_pt_br TEXT +summary_generated_at TIMESTAMPTZ +summary_model TEXT +summary_status TEXT CHECK ('pending'|'ai_generated'|'curated'|'refused') + index on summary_status + GRANT UPDATE (summary_*) ON entities TO investigator + new policy entities_investigator_update_summary (RLS UPDATE for investigator role) Enrichment script (investigator-runtime/scripts/enrich_entity_summaries.ts): - Per-class config (chunk_k, min_mentions, max_per_class) - Path A: entity_mentions JOIN chunks (high-precision linker) - Path B (fallback): hybridSearch on canonical_name + aliases when entity_mentions returns zero. This is what unlocked Kenneth Arnold and similar entities — their wiki YAML has high total_mentions counted from frontmatter mentioned_in[], but the entity_mentions extractor was silent because the matches came from the wiki text, not the OCR chunks. - Sonnet 4.6 via OAuth Max, ~$0.04 per entity, ~$10 for the full 260-entity bulk run. - INSUFFICIENT skip when chunks can't sustain a 60-word summary — refused entries get summary_status='refused' so they're not retried. UI uplift: - lib/retrieval/entity-pages.ts: getEntityCore now prefers the DB summary (ai_generated or curated) over wiki YAML narrative. - components/entity-list-page.tsx: * SELECT now pulls summary_en, summary_pt_br, summary_status * Sorted with summary-enriched rows first (so the magazine grid lands on quality content immediately) * MagazineGrid: 4-line summary preview replaces aliases line * CompactGrid: enriched rows render as full editorial cards, bare rows fall back to a compact table below Smoke results: - Kenneth Arnold sighting: "On June 24, 1947, pilot Kenneth Arnold reported sighting unidentified objects over the Pacific Northwest, and the account spread worldwide. It set off a run of similar reports: County Commissioner Crankes saw comparable objects after Arnold's account reached the press, and United Airlines pilot Emil H. Smith spotted flying discs on July 4 during a routine flight out of Boise, Idaho..." - Roswell Incident: includes Colonel Corso's 1997 book + the 1995 GAO finding that radio messages from Oct 46–Feb 47 were destroyed + Senator Strom Thurmond's foreword. Real magazine-grade content. Background bulk run kicked off across all 5 classes (event, uap_object, person, location, organization) — populating live as the homepage rebuilds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 18:37:01 +00:00
-- 0008_entity_summaries.sql — bilingual prose summary per entity.
--
-- The /sightings, /witnesses, /objects, /locations, /operations pages
-- need real prose to feel like a magazine. Today they show just a name +
-- mention count. After this migration, each entity carries an ~80-word
-- bilingual narrative summary written from the chunks where it appears.
--
-- The narrator (case-writer voice, house style) writes one summary per
-- entity. Generation is offline (scripts/maintain/61_enrich_entity_summaries.ts)
-- and idempotent — re-running the script skips rows already enriched.
--
-- Apply as supabase_admin (entities table owner).
BEGIN;
ALTER TABLE public.entities
ADD COLUMN IF NOT EXISTS summary_en TEXT,
ADD COLUMN IF NOT EXISTS summary_pt_br TEXT,
ADD COLUMN IF NOT EXISTS summary_generated_at TIMESTAMPTZ,
ADD COLUMN IF NOT EXISTS summary_model TEXT,
ADD COLUMN IF NOT EXISTS summary_status TEXT
CHECK (summary_status IN ('pending', 'ai_generated', 'curated', 'refused'));
CREATE INDEX IF NOT EXISTS entities_summary_status_idx
ON public.entities (summary_status) WHERE summary_status IS NOT NULL;
COMMIT;