Commit graph

5 commits

Author SHA1 Message Date
Luiz Gustavo
857dd771d2 W3.8: Schneier red-team detective + /h/[hypothesisId] dossier page
Some checks failed
CI / Web — typecheck + lint + build (push) Failing after 33s
CI / Scripts — Python smoke (push) Failing after 7s
CI / Web — npm audit (push) Failing after 38s
CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 4s
Adds the fourth AI detective in the Investigation Bureau runtime: Bruce
Schneier, who attacks an existing hypothesis as a red-team operator.

Runtime:
  - prompts/schneier.md — discipline (don't disprove, just attack;
    structured output with hidden_assumptions, failure_modes,
    alternative_explanations, recommended_tests, verdict_one_sentence;
    severity ∈ {low, medium, high}; emit INSUFFICIENT_HYPOTHESIS when
    the input is too thin)
  - src/detectives/schneier.ts — reads the hypothesis row + evidence
    chain (joined via evidence_refs FK), feeds Claude with the
    arguments + verbatim quotes, parses strict JSON object
  - src/tools/write_red_team_review.ts — UPDATEs hypotheses.reviewed_by
    + updated_at; APPENDS (or replaces if re-reviewed) a structured
    "## Red-team review (Schneier · X severity)" section to
    case/hypotheses/H-NNNN.md. Caps each list at 5 entries × 240 chars,
    validates verdict ≤ 280 chars.
  - orchestrator: new `red_team_review` kind dispatching to runSchneier

Chat + UI:
  - request_investigation gains kind=red_team_review + hypothesis_id arg
    (validated against H-NNNN regex); detective auto-resolves to schneier
  - chat-bubble inline card paints Schneier in red (#ff3344)
  - /jobs/[id] page swaps title/subtitle/tone per detective; the
    "Question" label becomes "Hypothesis under attack" for red_team_review

New /h/[hypothesisId] page (hypothesis dossier):
  - Server-rendered from public.hypotheses + public.evidence (joined
    via evidence_refs FK + chunk lookup)
  - Header: ID + creator + reviewer (highlighted when Schneier has
    visited), position as headline, question subtitle, Tetlock band
  - Prior + posterior bars with Δ-delta indicator
  - Argument grid: argument_for (green) vs argument_against (pink)
    side-by-side with [[wiki-link]] auto-linking to source chunks
  - Evidence chain: each E-NNNN with Grade A/B/C badge, verbatim
    blockquote, link to source page
  - Red-team review panel: parses the markdown section in the case
    file (severity badge, verdict, 4 bullet panels for
    hidden_assumptions / failure_modes / alternative_explanations /
    recommended_tests). Empty state when not yet reviewed.

RedTeamRequestButton client component + POST /api/h/[id]/red-team —
authenticated user can trigger Schneier in one click; UI swaps to
"acompanhar" link to /jobs/[id] once queued.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 21:48:12 -03:00
Luiz Gustavo
5ac53cb3e2 W3.7: Dupin contradiction-scan detective + UI integration
Some checks failed
CI / Web — typecheck + lint + build (push) Failing after 39s
CI / Scripts — Python smoke (push) Failing after 4s
CI / Web — npm audit (push) Failing after 37s
CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 4s
Adds the third AI detective in the Investigation Bureau runtime: C. Auguste
Dupin, who scans a corpus shortlist for pairs (or small groups) of chunks
that cannot both be true under any ordinary reading.

Runtime:
  - prompts/dupin.md — discipline (no contradiction without ≥2 distinct
    chunk_ids; reject same-vocabulary near-misses; FEW high-confidence
    over MANY weak ones; emit `NO_CONTRADICTIONS` when corpus is silent)
  - src/detectives/dupin.ts — hybridSearch with k=18 (more chunks than
    Holmes because contradictions emerge from comparing dispersed
    claims), strict JSON-array parsing, AT MOST 3 contradictions per call
  - src/tools/write_contradiction.ts — validates topic + ≥2 positions
    drawn from ≥2 distinct chunks, resolves chunk_pk via DB lookup
    (rejects positions citing unknown chunks), INSERTs into
    public.contradictions + writes case/contradictions/R-NNNN.md
  - orchestrator: new `contradiction_scan` kind dispatching to runDupin;
    payload { topic, doc_id?, lang?, context_chunks? }

Chat + UI:
  - request_investigation gains kind=contradiction_scan + topic arg;
    triggered detective auto-resolves to dupin
  - chat-bubble inline card renders dupin in orange (#ff8a4d) to
    distinguish from holmes (cyan) and locard (green)
  - /jobs/[id] page swaps title + subtitle + tone per detective;
    "Question" label becomes "Topic" for contradiction_scan
  - /api/jobs/[id] hydrates public.contradictions when outputs[] surfaces
    contradiction_ids
  - job-status-poller renders ContradictionCard: topic + N positions
    (verbatim statements quoted, stance label optional, link to source
    chunk) + optional notes panel, with resolution_status badge
    (open/resolved/irreconcilable)

R-NNNN shares the contradiction_id_seq slot with relation per
CLAUDE.md naming — same conceptual class (a connection between two
pieces of evidence in tension).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 21:34:04 -03:00
Luiz Gustavo
b76e81e4b3 W3.6: chat request_investigation tool + /jobs/[id] case-file viewer
Some checks failed
CI / Web — typecheck + lint + build (push) Failing after 44s
CI / Scripts — Python smoke (push) Failing after 3s
CI / Web — npm audit (push) Failing after 43s
CI / Retrieval — golden set (Recall@5 + MRR) (push) Failing after 7s
Closes the loop between the chat UI and the Investigation Bureau runtime.

Chat tool (web/lib/chat/tools.ts):
  - request_investigation { kind, question, doc_id?, chunks?, claim? }
    INSERTs a row in public.investigation_jobs and returns
    { job_id, kind, status, eta_seconds, status_url, detective }.
  - kind=hypothesis_tournament → Holmes (1 question → 2-3 rival hypotheses)
  - kind=evidence_chain → Locard (1 doc → grade-A/B/C evidence with chain
    of custody, default top-5 anomaly chunks)
  - Plumbed user.email through ToolHandlerContext so triggered_by audits
    the requesting user.

Public job viewer:
  - GET /api/jobs/[id] joins investigation_jobs → public.evidence +
    public.hypotheses for the IDs surfaced in outputs[]. Returns one
    payload the page can render without n+1 round-trips. Strips
    triggered_by from the response (it carries the user's email).
  - app/jobs/[id]/page.tsx server-renders the case-file shell:
    detective lore header (Holmes blue or Locard green), question chip,
    scope chip with link back to the document.
  - components/job-status-poller.tsx client island that polls every 3 s
    while non-terminal, then once on terminal to hydrate evidence +
    hypotheses. Renders:
      · Phase tracker (queued → running → complete | failed)
      · Hypothesis cards w/ prior + posterior bars + Δ delta indicator
        + Tetlock band badge (high/medium/low/speculation)
      · Argument-for / argument-against with [[wiki-link]] auto-linking
        to /d/<doc>/p<NNN>#<cNNNN>
      · Evidence cards w/ Grade A/B/C badge + verbatim blockquote +
        bbox crop preview via /api/crop + custody-steps disclosure
      · Empty/in-flight panel ("os detetives estão lendo o corpus")
      · Failure panel surfacing error + partial outputs

Inline chat-bubble card (components/chat-bubble.tsx):
  - ToolTrace.richRender recognises request_investigation results and
    renders a detective banner with status + ETA + link to /jobs/[id]
    (target=_blank). Error case renders a red strip with the message.

UX flow now: user asks Sherlock a question → request_investigation
queues the job → chat card shows "🔎 Holmes · hypothesis_tournament ·
ETA ~60s" → user clicks → /jobs/<id> live-updates → 60 s later, 2-3
rival hypotheses + their arguments + chunk citations are rendered with
Bayesian update visible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 21:26:18 -03:00
guto
7d13f93393 ship: synthesize 158 entities, AG-UI artifacts, chat persistence, auth flow
Fase 3 onda 2 — entity synthesis at scale:
- scripts/synthesize/20_entity_summary.py: queries DB for entities with
  total_mentions ≥ threshold + top-K verbatim chunk snippets via
  entity_mentions JOIN, prompts Sonnet (Holmes-Watson voice, bilingual),
  writes narrative_summary EN+PT-BR + summary_status=synthesized.
  Ran on 187 candidates (mentions ≥ 20) → 158 OK · 1 err · 29 skipped (no
  snippets). Combined with anchor curation: 20 curated + 158 synthesized
  = 178 entities with real narrative (vs 0 a day ago).

Fase 4 — chat with typed artifacts + persistence:
- lib/chat/agui.ts: AG-UI v1 typed Artifact union (citation, crop_image,
  entity_card, evidence_card, hypothesis_card, case_card, navigation_offer)
  alongside the existing event types.
- lib/chat/tools.ts + openrouter.ts: hybrid_search emits up to 6
  citation + crop_image artifacts per query. Provider collects them and
  returns in done.artifacts so the route can persist.
- api/sessions/[id]/messages: persist artifacts to messages.citations.
- components/chat-bubble.tsx: ArtifactCard renders inline cards (citation,
  crop_image, entity_card, navigation_offer) for streamed and persisted
  messages. activeId now persisted in localStorage so navigation between
  pages keeps the same conversation. New sessions are lazy (only when user
  has zero). loadMessages hydrates tools + artifacts from server. CRUD UI:
  rename (✎) + archive (🗑) buttons per session in the list.

Home search:
- doc-list-filters: input now fires hybrid_search (rerank=0 for speed)
  in parallel with the local title filter; chunk hits render above the doc
  grid with snippet + score + classification.
- api/search/hybrid: accept ?rerank=0 to skip the cross-encoder (1.3s vs 60s).

Auth flow:
- infra: SMTP_HOST=mail.spacemail.com:587 + DMARC published; mail now lands
  in inbox. GOTRUE_MAILER_AUTOCONFIRM=false (real email verification).
- kong.yml: proxy /auth/callback on api.disclosure.top → web:3000 so PKCE
  email links don't 404 at the gateway.
- web/app/auth/callback: handle both ?code= (OAuth) and ?token=&type=
  (PKCE); redirect to the public site host before verifyOtp so the session
  cookie lands on the right domain.

Audit deliverables:
- .nirvana/outputs/disclosure-bureau/.../systems-atelier/: 5 docs (code
  analysis, tech debt, discovery brief, system arch, 5 ADRs) authored by
  sa-principal that produced this roadmap. Kept in-tree for traceability.
2026-05-18 03:52:59 -03:00
guto
19d0678e55 baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00