Commit graph

2 commits

Author SHA1 Message Date
Luiz Gustavo
f2b7b116ce W5.3 (Phase 3A): entity summaries — sub-pages get magazine-grade prose
Some checks failed
CI / Web — typecheck + lint + build (push) Failing after 45s
CI / Scripts — Python smoke (push) Failing after 4s
CI / Web — npm audit (push) Failing after 41s
CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 3s
Today /sightings, /witnesses, /objects, /locations and /operations show
a name + mention count and nothing else. After this each row carries a
60-100 word bilingual narrative summary written from the chunks where
the entity actually appears.

Migration 0008 (apply as supabase_admin):
  public.entities  +summary_en TEXT
                   +summary_pt_br TEXT
                   +summary_generated_at TIMESTAMPTZ
                   +summary_model TEXT
                   +summary_status TEXT
                     CHECK ('pending'|'ai_generated'|'curated'|'refused')
  + index on summary_status
  + GRANT UPDATE (summary_*) ON entities TO investigator
  + new policy entities_investigator_update_summary (RLS UPDATE for
    investigator role)

Enrichment script (investigator-runtime/scripts/enrich_entity_summaries.ts):
  - Per-class config (chunk_k, min_mentions, max_per_class)
  - Path A: entity_mentions JOIN chunks (high-precision linker)
  - Path B (fallback): hybridSearch on canonical_name + aliases when
    entity_mentions returns zero. This is what unlocked Kenneth Arnold
    and similar entities — their wiki YAML has high total_mentions
    counted from frontmatter mentioned_in[], but the entity_mentions
    extractor was silent because the matches came from the wiki text,
    not the OCR chunks.
  - Sonnet 4.6 via OAuth Max, ~$0.04 per entity, ~$10 for the full
    260-entity bulk run.
  - INSUFFICIENT skip when chunks can't sustain a 60-word summary —
    refused entries get summary_status='refused' so they're not retried.

UI uplift:
  - lib/retrieval/entity-pages.ts: getEntityCore now prefers the DB
    summary (ai_generated or curated) over wiki YAML narrative.
  - components/entity-list-page.tsx:
    * SELECT now pulls summary_en, summary_pt_br, summary_status
    * Sorted with summary-enriched rows first (so the magazine grid
      lands on quality content immediately)
    * MagazineGrid: 4-line summary preview replaces aliases line
    * CompactGrid: enriched rows render as full editorial cards,
      bare rows fall back to a compact table below

Smoke results:
  - Kenneth Arnold sighting: "On June 24, 1947, pilot Kenneth Arnold
    reported sighting unidentified objects over the Pacific Northwest,
    and the account spread worldwide. It set off a run of similar
    reports: County Commissioner Crankes saw comparable objects after
    Arnold's account reached the press, and United Airlines pilot
    Emil H. Smith spotted flying discs on July 4 during a routine
    flight out of Boise, Idaho..."
  - Roswell Incident: includes Colonel Corso's 1997 book + the 1995
    GAO finding that radio messages from Oct 46–Feb 47 were destroyed
    + Senator Strom Thurmond's foreword. Real magazine-grade content.

Background bulk run kicked off across all 5 classes (event,
uap_object, person, location, organization) — populating live as
the homepage rebuilds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 15:37:01 -03:00
Luiz Gustavo
ab4fe2a334 W5.1: enthusiast pivot — strip detective surfacing, magazine homepage
Some checks failed
CI / Retrieval — golden set (Recall@5 + MRR) (push) Waiting to run
CI / Web — typecheck + lint + build (push) Failing after 39s
CI / Scripts — Python smoke (push) Failing after 4s
CI / Web — npm audit (push) Has been cancelled
User explicit: "1 bilhão de entusiastas pelo mundo ovni" — site is for the
UFO-curious public, not for skeptics. The 8-detective scaffolding becomes
invisible plumbing; the reader sees stories about what was observed.

Reader-facing changes:

  New homepage (web/app/page.tsx)
    - SiteHeader: magazine-style top nav (no detective tiles)
    - HeroBanner: full-bleed editorial opener with declassified-page art
      background, display-serif headline, live stats row (122 docs,
      2047 events, 1861 witnesses, 867 craft catalogued)
    - FeaturedCase: cover-story treatment of the most recent case_report,
      uses a real document page as hero image, links to /c/[slug]
    - PortalGrid: 6 thematic doorways into the archive — Sightings,
      Witnesses, Craft, Hot spots, Programs, Documents — each tile shows
      a real entity count and short editorial blurb
    - GreatestHits: top 9 most-cited events from the corpus
      (Kenneth Arnold 1947, Mantell 1948, …) as a magazine grid
    - Doc list kept but reframed as "the primary record"

  New sub-pages (5)
    - /sightings → events (2047), magazine grid
    - /witnesses → people (1861), compact table
    - /objects   → uap_objects (867), magazine grid
    - /locations → locations (1757), compact table
    - /operations → organizations (1596), compact table
    - /documents → full doc list with thumbnails (mirrors homepage section
      for direct deep-link)
    All share <EntityListPage> shell with per-page i18n + JSON-LD ItemList

  Stripped detective surfacing
    - /jobs/[id]: "Sherlock Holmes / Dr. Watson" → "Investigation in progress"
    - chat-bubble: detective-named card → neutral "Investigação em andamento"
    - quick-launch: 7-kind detective dropdown → single "investigar um caso"
      input (kind=case_report hardcoded)
    - /bureau: rewritten as the case-file library (no artefact dumps)

Typography + design
  - Fraunces variable serif loaded for display headings
    (`.font-display` class)
  - Gold-amber accent (#e0c080) unified as the brand colour
  - Asymmetric magazine grids (1+2+3 column, generous whitespace)
  - Hover micro-interactions (image scale on featured case, translateX
    on portal arrows)

SEO + GEO
  - layout.tsx metadataBase + title.template + per-route Metadata exports
  - Organization JSON-LD on root layout
  - WebSite + SearchAction JSON-LD on homepage
  - CollectionPage + ItemList JSON-LD on every entity list page
  - openGraph + twitter cards, pt-BR primary + en-US alternate
  - ai:purpose meta tag for Generative Engine Optimization — declares
    the site as a citation-linked primary-source archive
  - robots: index + follow with large image preview

The detectives themselves remain alive in the backend (runtime, DB, audit
log), but the reader never sees "Holmes / Sun-Tzu / Watson" in the UI. The
next phase will reorient case-writer to write as a single best-seller voice
synthesising all the internal sources.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 14:09:46 -03:00