disclosure-bureau/infra
Luiz Gustavo f2b7b116ce
Some checks failed
CI / Web — typecheck + lint + build (push) Failing after 45s
CI / Scripts — Python smoke (push) Failing after 4s
CI / Web — npm audit (push) Failing after 41s
CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 3s
W5.3 (Phase 3A): entity summaries — sub-pages get magazine-grade prose
Today /sightings, /witnesses, /objects, /locations and /operations show
a name + mention count and nothing else. After this each row carries a
60-100 word bilingual narrative summary written from the chunks where
the entity actually appears.

Migration 0008 (apply as supabase_admin):
  public.entities  +summary_en TEXT
                   +summary_pt_br TEXT
                   +summary_generated_at TIMESTAMPTZ
                   +summary_model TEXT
                   +summary_status TEXT
                     CHECK ('pending'|'ai_generated'|'curated'|'refused')
  + index on summary_status
  + GRANT UPDATE (summary_*) ON entities TO investigator
  + new policy entities_investigator_update_summary (RLS UPDATE for
    investigator role)

Enrichment script (investigator-runtime/scripts/enrich_entity_summaries.ts):
  - Per-class config (chunk_k, min_mentions, max_per_class)
  - Path A: entity_mentions JOIN chunks (high-precision linker)
  - Path B (fallback): hybridSearch on canonical_name + aliases when
    entity_mentions returns zero. This is what unlocked Kenneth Arnold
    and similar entities — their wiki YAML has high total_mentions
    counted from frontmatter mentioned_in[], but the entity_mentions
    extractor was silent because the matches came from the wiki text,
    not the OCR chunks.
  - Sonnet 4.6 via OAuth Max, ~$0.04 per entity, ~$10 for the full
    260-entity bulk run.
  - INSUFFICIENT skip when chunks can't sustain a 60-word summary —
    refused entries get summary_status='refused' so they're not retried.

UI uplift:
  - lib/retrieval/entity-pages.ts: getEntityCore now prefers the DB
    summary (ai_generated or curated) over wiki YAML narrative.
  - components/entity-list-page.tsx:
    * SELECT now pulls summary_en, summary_pt_br, summary_status
    * Sorted with summary-enriched rows first (so the magazine grid
      lands on quality content immediately)
    * MagazineGrid: 4-line summary preview replaces aliases line
    * CompactGrid: enriched rows render as full editorial cards,
      bare rows fall back to a compact table below

Smoke results:
  - Kenneth Arnold sighting: "On June 24, 1947, pilot Kenneth Arnold
    reported sighting unidentified objects over the Pacific Northwest,
    and the account spread worldwide. It set off a run of similar
    reports: County Commissioner Crankes saw comparable objects after
    Arnold's account reached the press, and United Airlines pilot
    Emil H. Smith spotted flying discs on July 4 during a routine
    flight out of Boise, Idaho..."
  - Roswell Incident: includes Colonel Corso's 1997 book + the 1995
    GAO finding that radio messages from Oct 46–Feb 47 were destroyed
    + Senator Strom Thurmond's foreword. Real magazine-grade content.

Background bulk run kicked off across all 5 classes (event,
uap_object, person, location, organization) — populating live as
the homepage rebuilds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 15:37:01 -03:00
..
coolify baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00
disclosure-stack W3.9 followup: mount case/ ro into web container 2026-05-23 22:45:00 -03:00
embed-service baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00
supabase W5.3 (Phase 3A): entity summaries — sub-pages get magazine-grade prose 2026-05-24 15:37:01 -03:00
DEPLOY-CHECKLIST.md baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00
README.md baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00
RETRIEVAL.md baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00

Infrastructure — Disclosure Bureau

Self-hosted stack on a single VPS (16 GB / 4 CPU / 200 GB NVMe) managed via Coolify.

                   Internet (443/80)
                          │
                ┌─────────▼─────────┐
                │  Caddy (Coolify)  │ ← auto-TLS Let's Encrypt
                └────┬──────────────┘
                     │
       ┌─────────────┼──────────────────────┬──────────────────┐
       ▼             ▼                      ▼                  ▼
  ┌─────────┐   ┌──────────┐         ┌──────────┐       ┌──────────┐
  │ Next.js │   │ Supabase │         │ Supabase │       │  shared  │
  │   web   │   │ disclosure│        │ project-B │       │ services │
  │ :3000   │   │ stack    │         │  stack    │       │ Meili··· │
  └─────────┘   │  ┌─────┐ │         │  ┌─────┐ │       │ Imgproxy │
                │  │PG/GT│ │         │  │PG/GT│ │       │ Dragonfly│
                │  └─────┘ │         │  └─────┘ │       └──────────┘
                └──────────┘         └──────────┘
                disclosure.top       projeto-b.com

Components

Layer Service Notes
Orchestration Coolify v4 Self-hosted PaaS — manages all containers, TLS, backups
Database + Auth + Storage Supabase self-hosted (one per project) Each project gets own Postgres + GoTrue + Storage
Frontend Next.js 15 (this repo's /web) Deployed via Coolify Git integration
Search Meilisearch (shared) Full-text search across pages + entities
Cache + Queue Dragonfly (shared) Redis-compatible, multi-threaded
Images Imgproxy (shared) On-the-fly resize / WebP conversion
Backups restic + Backblaze B2 Nightly Postgres + Storage dumps

Quick path

  1. coolify/INSTALL.md — install Coolify on the fresh VPS (~10 min)
  2. coolify/SUPABASE.md — create the disclosure Supabase project (~5 min)
  3. Run supabase/migrations/0001_chat_schema.sql via Supabase Studio SQL editor
  4. coolify/NEXTJS.md — deploy the /web app pointing at the Supabase URL
  5. coolify/SHARED.md — bring up Meilisearch, Dragonfly, Imgproxy

Adding more projects later

For each new project, repeat step 2 (new Supabase project in Coolify UI) and step 4 (new Next.js app). They get their own subdomain, own auth, own data. Total isolation.

Local development

For dev on macOS/Linux without the VPS, see ../web/README.md — uses the Supabase CLI to spin up a local stack on localhost:54321.