Today /sightings, /witnesses, /objects, /locations and /operations show
a name + mention count and nothing else. After this each row carries a
60-100 word bilingual narrative summary written from the chunks where
the entity actually appears.
Migration 0008 (apply as supabase_admin):
public.entities +summary_en TEXT
+summary_pt_br TEXT
+summary_generated_at TIMESTAMPTZ
+summary_model TEXT
+summary_status TEXT
CHECK ('pending'|'ai_generated'|'curated'|'refused')
+ index on summary_status
+ GRANT UPDATE (summary_*) ON entities TO investigator
+ new policy entities_investigator_update_summary (RLS UPDATE for
investigator role)
Enrichment script (investigator-runtime/scripts/enrich_entity_summaries.ts):
- Per-class config (chunk_k, min_mentions, max_per_class)
- Path A: entity_mentions JOIN chunks (high-precision linker)
- Path B (fallback): hybridSearch on canonical_name + aliases when
entity_mentions returns zero. This is what unlocked Kenneth Arnold
and similar entities — their wiki YAML has high total_mentions
counted from frontmatter mentioned_in[], but the entity_mentions
extractor was silent because the matches came from the wiki text,
not the OCR chunks.
- Sonnet 4.6 via OAuth Max, ~$0.04 per entity, ~$10 for the full
260-entity bulk run.
- INSUFFICIENT skip when chunks can't sustain a 60-word summary —
refused entries get summary_status='refused' so they're not retried.
UI uplift:
- lib/retrieval/entity-pages.ts: getEntityCore now prefers the DB
summary (ai_generated or curated) over wiki YAML narrative.
- components/entity-list-page.tsx:
* SELECT now pulls summary_en, summary_pt_br, summary_status
* Sorted with summary-enriched rows first (so the magazine grid
lands on quality content immediately)
* MagazineGrid: 4-line summary preview replaces aliases line
* CompactGrid: enriched rows render as full editorial cards,
bare rows fall back to a compact table below
Smoke results:
- Kenneth Arnold sighting: "On June 24, 1947, pilot Kenneth Arnold
reported sighting unidentified objects over the Pacific Northwest,
and the account spread worldwide. It set off a run of similar
reports: County Commissioner Crankes saw comparable objects after
Arnold's account reached the press, and United Airlines pilot
Emil H. Smith spotted flying discs on July 4 during a routine
flight out of Boise, Idaho..."
- Roswell Incident: includes Colonel Corso's 1997 book + the 1995
GAO finding that radio messages from Oct 46–Feb 47 were destroyed
+ Senator Strom Thurmond's foreword. Real magazine-grade content.
Background bulk run kicked off across all 5 classes (event,
uap_object, person, location, organization) — populating live as
the homepage rebuilds.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User explicit: "1 bilhão de entusiastas pelo mundo ovni" — site is for the
UFO-curious public, not for skeptics. The 8-detective scaffolding becomes
invisible plumbing; the reader sees stories about what was observed.
Reader-facing changes:
New homepage (web/app/page.tsx)
- SiteHeader: magazine-style top nav (no detective tiles)
- HeroBanner: full-bleed editorial opener with declassified-page art
background, display-serif headline, live stats row (122 docs,
2047 events, 1861 witnesses, 867 craft catalogued)
- FeaturedCase: cover-story treatment of the most recent case_report,
uses a real document page as hero image, links to /c/[slug]
- PortalGrid: 6 thematic doorways into the archive — Sightings,
Witnesses, Craft, Hot spots, Programs, Documents — each tile shows
a real entity count and short editorial blurb
- GreatestHits: top 9 most-cited events from the corpus
(Kenneth Arnold 1947, Mantell 1948, …) as a magazine grid
- Doc list kept but reframed as "the primary record"
New sub-pages (5)
- /sightings → events (2047), magazine grid
- /witnesses → people (1861), compact table
- /objects → uap_objects (867), magazine grid
- /locations → locations (1757), compact table
- /operations → organizations (1596), compact table
- /documents → full doc list with thumbnails (mirrors homepage section
for direct deep-link)
All share <EntityListPage> shell with per-page i18n + JSON-LD ItemList
Stripped detective surfacing
- /jobs/[id]: "Sherlock Holmes / Dr. Watson" → "Investigation in progress"
- chat-bubble: detective-named card → neutral "Investigação em andamento"
- quick-launch: 7-kind detective dropdown → single "investigar um caso"
input (kind=case_report hardcoded)
- /bureau: rewritten as the case-file library (no artefact dumps)
Typography + design
- Fraunces variable serif loaded for display headings
(`.font-display` class)
- Gold-amber accent (#e0c080) unified as the brand colour
- Asymmetric magazine grids (1+2+3 column, generous whitespace)
- Hover micro-interactions (image scale on featured case, translateX
on portal arrows)
SEO + GEO
- layout.tsx metadataBase + title.template + per-route Metadata exports
- Organization JSON-LD on root layout
- WebSite + SearchAction JSON-LD on homepage
- CollectionPage + ItemList JSON-LD on every entity list page
- openGraph + twitter cards, pt-BR primary + en-US alternate
- ai:purpose meta tag for Generative Engine Optimization — declares
the site as a citation-linked primary-source archive
- robots: index + follow with large image preview
The detectives themselves remain alive in the backend (runtime, DB, audit
log), but the reader never sees "Holmes / Sun-Tzu / Watson" in the UI. The
next phase will reorient case-writer to write as a single best-seller voice
synthesising all the internal sources.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>