disclosure-bureau/web/lib
Luiz Gustavo f2b7b116ce
Some checks failed
CI / Web — typecheck + lint + build (push) Failing after 45s
CI / Scripts — Python smoke (push) Failing after 4s
CI / Web — npm audit (push) Failing after 41s
CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 3s
W5.3 (Phase 3A): entity summaries — sub-pages get magazine-grade prose
Today /sightings, /witnesses, /objects, /locations and /operations show
a name + mention count and nothing else. After this each row carries a
60-100 word bilingual narrative summary written from the chunks where
the entity actually appears.

Migration 0008 (apply as supabase_admin):
  public.entities  +summary_en TEXT
                   +summary_pt_br TEXT
                   +summary_generated_at TIMESTAMPTZ
                   +summary_model TEXT
                   +summary_status TEXT
                     CHECK ('pending'|'ai_generated'|'curated'|'refused')
  + index on summary_status
  + GRANT UPDATE (summary_*) ON entities TO investigator
  + new policy entities_investigator_update_summary (RLS UPDATE for
    investigator role)

Enrichment script (investigator-runtime/scripts/enrich_entity_summaries.ts):
  - Per-class config (chunk_k, min_mentions, max_per_class)
  - Path A: entity_mentions JOIN chunks (high-precision linker)
  - Path B (fallback): hybridSearch on canonical_name + aliases when
    entity_mentions returns zero. This is what unlocked Kenneth Arnold
    and similar entities — their wiki YAML has high total_mentions
    counted from frontmatter mentioned_in[], but the entity_mentions
    extractor was silent because the matches came from the wiki text,
    not the OCR chunks.
  - Sonnet 4.6 via OAuth Max, ~$0.04 per entity, ~$10 for the full
    260-entity bulk run.
  - INSUFFICIENT skip when chunks can't sustain a 60-word summary —
    refused entries get summary_status='refused' so they're not retried.

UI uplift:
  - lib/retrieval/entity-pages.ts: getEntityCore now prefers the DB
    summary (ai_generated or curated) over wiki YAML narrative.
  - components/entity-list-page.tsx:
    * SELECT now pulls summary_en, summary_pt_br, summary_status
    * Sorted with summary-enriched rows first (so the magazine grid
      lands on quality content immediately)
    * MagazineGrid: 4-line summary preview replaces aliases line
    * CompactGrid: enriched rows render as full editorial cards,
      bare rows fall back to a compact table below

Smoke results:
  - Kenneth Arnold sighting: "On June 24, 1947, pilot Kenneth Arnold
    reported sighting unidentified objects over the Pacific Northwest,
    and the account spread worldwide. It set off a run of similar
    reports: County Commissioner Crankes saw comparable objects after
    Arnold's account reached the press, and United Airlines pilot
    Emil H. Smith spotted flying discs on July 4 during a routine
    flight out of Boise, Idaho..."
  - Roswell Incident: includes Colonel Corso's 1997 book + the 1995
    GAO finding that radio messages from Oct 46–Feb 47 were destroyed
    + Senator Strom Thurmond's foreword. Real magazine-grade content.

Background bulk run kicked off across all 5 classes (event,
uap_object, person, location, organization) — populating live as
the homepage rebuilds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 15:37:01 -03:00
..
chat W3.8: Investigation Bureau complete — Poirot, Taleb, Tetlock, Case-Writer 2026-05-23 22:11:39 -03:00
i18n W4: bilingual EN + PT-BR Investigation Bureau (CLAUDE.md §3 contract) 2026-05-24 12:02:59 -03:00
retrieval W5.3 (Phase 3A): entity summaries — sub-pages get magazine-grade prose 2026-05-24 15:37:01 -03:00
supabase baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00
chunks.ts add clean LLM reading version of documents (the core goal) 2026-05-21 17:23:36 -03:00
doc-renderer.ts baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00
doc-summary.ts baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00
entity-index.ts baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00
fm-types.ts baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00
logger.ts W0+W1+W1.2: security hardening, observability, autocomplete, glitchtip, forgejo CI 2026-05-23 18:18:42 -03:00
wiki.ts baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00