Today /sightings, /witnesses, /objects, /locations and /operations show
a name + mention count and nothing else. After this each row carries a
60-100 word bilingual narrative summary written from the chunks where
the entity actually appears.
Migration 0008 (apply as supabase_admin):
public.entities +summary_en TEXT
+summary_pt_br TEXT
+summary_generated_at TIMESTAMPTZ
+summary_model TEXT
+summary_status TEXT
CHECK ('pending'|'ai_generated'|'curated'|'refused')
+ index on summary_status
+ GRANT UPDATE (summary_*) ON entities TO investigator
+ new policy entities_investigator_update_summary (RLS UPDATE for
investigator role)
Enrichment script (investigator-runtime/scripts/enrich_entity_summaries.ts):
- Per-class config (chunk_k, min_mentions, max_per_class)
- Path A: entity_mentions JOIN chunks (high-precision linker)
- Path B (fallback): hybridSearch on canonical_name + aliases when
entity_mentions returns zero. This is what unlocked Kenneth Arnold
and similar entities — their wiki YAML has high total_mentions
counted from frontmatter mentioned_in[], but the entity_mentions
extractor was silent because the matches came from the wiki text,
not the OCR chunks.
- Sonnet 4.6 via OAuth Max, ~$0.04 per entity, ~$10 for the full
260-entity bulk run.
- INSUFFICIENT skip when chunks can't sustain a 60-word summary —
refused entries get summary_status='refused' so they're not retried.
UI uplift:
- lib/retrieval/entity-pages.ts: getEntityCore now prefers the DB
summary (ai_generated or curated) over wiki YAML narrative.
- components/entity-list-page.tsx:
* SELECT now pulls summary_en, summary_pt_br, summary_status
* Sorted with summary-enriched rows first (so the magazine grid
lands on quality content immediately)
* MagazineGrid: 4-line summary preview replaces aliases line
* CompactGrid: enriched rows render as full editorial cards,
bare rows fall back to a compact table below
Smoke results:
- Kenneth Arnold sighting: "On June 24, 1947, pilot Kenneth Arnold
reported sighting unidentified objects over the Pacific Northwest,
and the account spread worldwide. It set off a run of similar
reports: County Commissioner Crankes saw comparable objects after
Arnold's account reached the press, and United Airlines pilot
Emil H. Smith spotted flying discs on July 4 during a routine
flight out of Boise, Idaho..."
- Roswell Incident: includes Colonel Corso's 1997 book + the 1995
GAO finding that radio messages from Oct 46–Feb 47 were destroyed
+ Senator Strom Thurmond's foreword. Real magazine-grade content.
Background bulk run kicked off across all 5 classes (event,
uap_object, person, location, organization) — populating live as
the homepage rebuilds.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
27 lines
1.1 KiB
PL/PgSQL
27 lines
1.1 KiB
PL/PgSQL
-- 0008_entity_summaries.sql — bilingual prose summary per entity.
|
|
--
|
|
-- The /sightings, /witnesses, /objects, /locations, /operations pages
|
|
-- need real prose to feel like a magazine. Today they show just a name +
|
|
-- mention count. After this migration, each entity carries an ~80-word
|
|
-- bilingual narrative summary written from the chunks where it appears.
|
|
--
|
|
-- The narrator (case-writer voice, house style) writes one summary per
|
|
-- entity. Generation is offline (scripts/maintain/61_enrich_entity_summaries.ts)
|
|
-- and idempotent — re-running the script skips rows already enriched.
|
|
--
|
|
-- Apply as supabase_admin (entities table owner).
|
|
|
|
BEGIN;
|
|
|
|
ALTER TABLE public.entities
|
|
ADD COLUMN IF NOT EXISTS summary_en TEXT,
|
|
ADD COLUMN IF NOT EXISTS summary_pt_br TEXT,
|
|
ADD COLUMN IF NOT EXISTS summary_generated_at TIMESTAMPTZ,
|
|
ADD COLUMN IF NOT EXISTS summary_model TEXT,
|
|
ADD COLUMN IF NOT EXISTS summary_status TEXT
|
|
CHECK (summary_status IN ('pending', 'ai_generated', 'curated', 'refused'));
|
|
|
|
CREATE INDEX IF NOT EXISTS entities_summary_status_idx
|
|
ON public.entities (summary_status) WHERE summary_status IS NOT NULL;
|
|
|
|
COMMIT;
|