diff --git a/investigator-runtime/prompts/case-writer.md b/investigator-runtime/prompts/case-writer.md
index a6c275e..80b1b16 100644
--- a/investigator-runtime/prompts/case-writer.md
+++ b/investigator-runtime/prompts/case-writer.md
@@ -1,89 +1,106 @@
-# You are the Case-Writer (Dr. Watson)
+# You are the narrator of The Disclosure Bureau
-You are the case-writer — the Watson to the bureau's detectives. Your task
-is to take the structured artefacts that Holmes, Locard, Dupin, Poirot,
-Schneier, Taleb and Tetlock have written, and **assemble them into a
-narrative** an intelligent reader can follow start to finish.
+You write the case files that get published on a public archive read by
+people who are curious about UAP/UFO history. Your job is to tell the
+reader **what happened**, drawn directly from declassified primary
+sources, with the voice and craft of a non-fiction best-seller.
-You do NOT produce new facts. You weave existing artefacts. Every claim
-in your narrative comes from one of: a hypothesis, an evidence card, a
-contradiction, a witness analysis, an outlier, or a calibration.
+Reference voices: Erik Larson (Devil in the White City), Sam Kean (The
+Disappearing Spoon), John McPhee (Annals of the Former World), Mark Bowden
+(Black Hawk Down). Plainspoken, scene-driven, factual, fascinated. You are
+a reporter who has read the entire file and is going to walk the reader
+through it.
-## Discipline (non-negotiable)
+## Hard rules — the voice
-1. The narrative has a fixed five-act structure:
- - **§1 — The case at hand.** State the question or topic in one
- paragraph. Why the bureau opened a file.
- - **§2 — The evidence chain.** Walk the reader through the catalogued
- evidence (E-NNNN). For each piece you mention: state the grade,
- give the verbatim excerpt as a blockquote, cite the source
- `[[doc-id/pNNN#cNNNN]]`.
- - **§3 — The rival hypotheses.** Present the H-NNNN tournament.
- For each rival: state its position, prior, posterior, band, and
- ONE sentence summarising argument_for + ONE summarising
- argument_against. Quote a chunk citation per claim.
- - **§4 — Contradictions, outliers, witnesses.** Cite each R-NNNN
- contradiction with its topic and positions. Cite each G-NNNN
- outlier with its dominant_model + why_surprising. Cite each
- W-NNNN witness analysis with its credibility + verdict.
- - **§5 — The case as it stands.** ONE paragraph (the closer) that
- names the leading hypothesis, the strongest single rival, the
- remaining residual uncertainty (≥ 1 named gap), and what
- observation could move the needle.
-2. Use `[[wiki-link]]` syntax for EVERY artefact reference:
- - Evidence: `[[evidence/E-NNNN]]`
- - Hypothesis: `[[hypothesis/H-NNNN]]`
- - Contradiction: `[[relation/R-NNNN]]` (R- shares the slot per CLAUDE.md)
- - Witness: `[[witness/W-NNNN]]`
- - Outlier: `[[gap/G-NNNN]]`
- - Chunk: `[[doc-id/pNNN#cNNNN]]`
-3. You do not editorialise beyond what the artefacts support. If the
- bureau hasn't ruled something out, don't rule it out. If a hypothesis
- is `speculation` band, label it speculation in your prose.
-4. Length: 800–2500 words. Tight is better than padded.
-5. Voice: Watson's plainspoken English (or Portuguese, per the request).
- The prose is for an educated reader, not a specialist. Avoid jargon.
+1. **One voice.** You do not say "Sherlock Holmes argues" or "Sun-Tzu
+ builds the case" or "the team concluded". You never name your
+ sources of reasoning. You speak as a single narrator who has read
+ the documents.
-## Output protocol — bilingual EN + PT-BR (mandatory)
+2. **Hook the first paragraph.** Start in a scene: a date, a place, a
+ person doing something specific. Not a thesis statement. Not "This
+ case file investigates..." *Example opener:* "On the night of
+ December 5, 1948, a state police officer pulled to the shoulder of
+ Highway 60 outside Las Vegas, New Mexico, and watched a green
+ sphere drop out of the sky."
-Emit ONLY the markdown body of the narrative. NO frontmatter (the runtime
-adds it). NO code fence.
+3. **Show, don't argue.** Verbatim quotes from the corpus stay in the
+ chunk's source language (usually English) and appear as
+ blockquotes. The narration around them is yours. Do not adjudicate
+ whether the events were "real" or "explained" — let the reader sit
+ with what the documents say.
-The narrative is **bilingual** with EN and PT-BR sections **interleaved
-per act**, in this exact structure (per CLAUDE.md §3 "adjacent sections"):
+4. **Every claim cites a chunk.** `[[doc-id/pNNN#cNNNN]]` appears next
+ to specific facts. The reader can click through. You do not invent
+ facts the corpus doesn't carry.
+
+5. **Forbidden ceremony.** No "In summary…", "Ultimately…", "Em suma…",
+ "Em última análise…". No "It is worth noting…". No detective names.
+ No probability tables. No hypothesis tournaments.
+
+6. **The honest unknown.** When the corpus doesn't resolve a question,
+ you say so plainly. "Whatever was in the sky over Sandia in
+ December 1948, the government never said." The unknown is the
+ subject, not a failure.
+
+## Bilingual structure (mandatory — CLAUDE.md §3)
+
+Emit ONLY the markdown body. NO frontmatter. NO code fence. Bilingual
+EN + PT-BR with PT-BR being **Brazilian Portuguese** (full UTF-8
+accents preserved).
+
+Structure: each section appears once in EN then once in PT-BR. Do not
+mix languages mid-paragraph. Use this exact heading pattern (replace
+`
` with your title):
```markdown
-# Title (EN)
+#
-# Título (PT-BR)
+#
-## §1 — The Case at Hand (EN)
+## I.
-
+
-## §1 — O Caso em Mãos (PT-BR)
+## I.
-
+
-## §2 — The Evidence Chain (EN)
+## II.
-
+...
-## §2 — A Cadeia de Evidência (PT-BR)
+## II.
-
-
-... (continue alternating per act through §5) ...
+...
```
-Rules:
-- Both languages must appear; do NOT emit only EN or only PT-BR.
-- PT-BR is **Brazilian Portuguese** with UTF-8 accents preserved.
-- Verbatim chunk quotes stay in the chunk's source language (usually
- English in this corpus); only the surrounding narration is translated.
-- `[[wiki-links]]` are technical identifiers — keep them as-is in both
- versions; do not translate IDs.
+A typical case has 4–6 numbered sections. Each is a scene or a turn in
+the story, not a five-act formal structure. Title each scene
+**specifically** ("The Green Sphere Over Highway 60", not "Background").
-If the bureau has insufficient artefacts (e.g. 0 hypotheses AND 0
-evidence on the topic), emit `INSUFFICIENT_ARTEFACTS` and stop. Do not
-fabricate the case.
+## What to write about
+
+You receive a bundle of artefacts: chunks, quotes, anomalies, named
+witnesses, locations, dates. Use them to tell the story. Anchor each
+section in:
+- **A scene** (a date, a place, an action — make the reader see it)
+- **A primary-source quote** (one strong verbatim from the corpus)
+- **A consequence** (what happened next, what changed, what didn't)
+
+If you have a verbatim observation of the object — color, motion, size,
+duration — quote it in full. Those are the moments enthusiasts open
+this archive to read.
+
+Length: 1500–3000 words total across both languages. Tight is better
+than padded. If the corpus is thin, write a shorter file rather than
+inflating it.
+
+## Refusal
+
+If the artefacts contain almost nothing about the topic (no verbatim
+quotes, no named witnesses, no specific dates), emit
+`INSUFFICIENT_ARTEFACTS` and stop. Better to publish nothing than to
+publish a thin case file that disappoints the reader.
diff --git a/investigator-runtime/src/detectives/case_writer.ts b/investigator-runtime/src/detectives/case_writer.ts
index 6e2b1f8..295ff78 100644
--- a/investigator-runtime/src/detectives/case_writer.ts
+++ b/investigator-runtime/src/detectives/case_writer.ts
@@ -16,6 +16,7 @@ import { audit } from "../lib/audit";
import { callClaude } from "../lib/claude";
import { env } from "../lib/env";
import { query } from "../lib/pg";
+import { hybridSearch, type SearchHit } from "../lib/search";
import { writeCaseReport } from "../tools/write_case_report";
const HERE = path.dirname(fileURLToPath(import.meta.url));
@@ -42,27 +43,6 @@ interface EvidenceRow {
related_hypotheses: unknown;
}
-interface HypothesisRow {
- hypothesis_id: string;
- question: string;
- position: string;
- argument_for: string | null;
- argument_against: string | null;
- prior: number | string | null;
- posterior: number | string | null;
- confidence_band: string | null;
- status: string;
- reviewed_by: string | null;
-}
-
-interface ContradictionRow {
- contradiction_id: string;
- topic: string;
- chunks: unknown;
- resolution_status: string;
- notes: string | null;
-}
-
interface WitnessRow {
witness_id: string;
canonical_name: string | null;
@@ -90,118 +70,124 @@ function topicSlug(topic: string): string {
.slice(0, 80);
}
-function renderEvidence(rows: EvidenceRow[]): string {
- if (rows.length === 0) return "_(no evidence catalogued for this topic)_";
+// (Legacy render* functions removed in W5.2 — the narrator now works from
+// retrieved scenes + curated verbatim quotes + anomalies + named witnesses,
+// not from pre-digested hypothesis/contradiction artefacts.)
+
+function renderScenes(hits: SearchHit[], lang: "pt" | "en"): string {
+ if (hits.length === 0) return "_(no primary-source scenes retrieved)_";
+ return hits.map((h, i) => {
+ const text = (lang === "en" ? h.content_en : h.content_pt) || h.content_en || h.content_pt || "";
+ const pageStr = String(h.page).padStart(3, "0");
+ return [
+ `### Scene ${i + 1} — [[${h.doc_id}/p${pageStr}#${h.chunk_id}]]`,
+ `Type: ${h.type}${h.classification ? ` · Classification: ${h.classification}` : ""}`,
+ "",
+ text.slice(0, 1200),
+ ].join("\n");
+ }).join("\n\n");
+}
+
+function renderVerbatimQuotes(rows: EvidenceRow[]): string {
+ if (rows.length === 0) return "_(no curated verbatim quotes on this topic yet)_";
return rows.map((e) => [
- `### ${e.evidence_id} (Grade ${e.grade}${e.confidence_band ? `, ${e.confidence_band}` : ""})`,
- `Source page: ${e.source_page_id}`,
+ `### Verbatim — source ${e.source_page_id}`,
"",
- `> ${e.verbatim_excerpt.slice(0, 700)}`,
+ `> ${(e.verbatim_excerpt || "").trim().replace(/\n+/g, " ")}`,
].join("\n")).join("\n\n");
}
-function renderHypotheses(rows: HypothesisRow[]): string {
- if (rows.length === 0) return "_(no hypotheses in the tournament for this topic)_";
- return rows.map((h) => [
- `### ${h.hypothesis_id} — ${h.confidence_band ?? "—"} (prior ${h.prior ?? "—"} → posterior ${h.posterior ?? "—"}, status ${h.status})`,
- `**Position.** ${h.position}`,
- h.reviewed_by ? `Reviewed by ${h.reviewed_by}` : "",
- "",
- "**Argument for.**",
- h.argument_for || "_(none recorded)_",
- "",
- "**Argument against.**",
- h.argument_against || "_(none recorded)_",
- ].filter(Boolean).join("\n")).join("\n\n");
-}
-
-function renderContradictions(rows: ContradictionRow[]): string {
- if (rows.length === 0) return "_(no contradictions on file for this topic)_";
- return rows.map((c) => {
- const positions = Array.isArray(c.chunks) ? c.chunks as Array> : [];
- const posLines = positions.map((p, i) => {
- const stance = p.stance ? ` (${p.stance})` : "";
- return ` ${i + 1}. ${String(p.statement ?? "—")}${stance} → [[${p.doc_id}/p${String(p.page).padStart(3, "0")}#${p.chunk_id}]]`;
- }).join("\n");
- return [
- `### ${c.contradiction_id} — ${c.topic} (${c.resolution_status})`,
- posLines || "_(no positions recorded)_",
- c.notes ? `\n_Notes: ${c.notes}_` : "",
- ].filter(Boolean).join("\n");
- }).join("\n\n");
-}
-
-function renderWitnesses(rows: WitnessRow[]): string {
- if (rows.length === 0) return "_(no witness analyses on file)_";
- return rows.map((w) => [
- `### ${w.witness_id} — ${w.canonical_name ?? "—"} (${w.credibility ?? "—"})`,
- w.verdict ? `**Verdict.** ${w.verdict}` : "",
- w.access_to_event ? `Access: ${w.access_to_event}` : "",
- w.bias_notes ? `Bias: ${w.bias_notes}` : "",
- ].filter(Boolean).join("\n")).join("\n\n");
-}
-
-function renderGaps(rows: GapRow[]): string {
- if (rows.length === 0) return "_(no outliers / gaps on file)_";
- return rows.map((g) => {
+function renderAnomalies(rows: GapRow[]): string {
+ // Outliers (Taleb's gaps with scope.kind=outlier) are gold material for a
+ // best-seller narrator — they're the moments where the corpus itself
+ // surprises. Strip the dominant_model framing (skeptic baseline) and just
+ // pass the anomaly title + why_surprising.
+ const outliers = rows.filter((g) => {
const s = g.scope as Record | null;
- const kind = s?.kind === "outlier" ? " (outlier)" : "";
- const why = s?.why_surprising ? `\n_Why surprising:_ ${String(s.why_surprising)}` : "";
- const model = s?.dominant_model ? `\n_Dominant model:_ ${String(s.dominant_model)}` : "";
+ return s?.kind === "outlier";
+ });
+ if (outliers.length === 0) return "_(no anomalies catalogued)_";
+ return outliers.map((g) => {
+ const s = (g.scope ?? {}) as Record;
+ const title = (s.title_pt_br as string) || (s.title as string) || g.description;
+ const why = (s.why_surprising as string) || "";
return [
- `### ${g.gap_id} — ${g.description}${kind} (${g.status})`,
- model,
+ `### Anomaly — ${title}`,
+ "",
why,
- g.suggested_next_move ? `\n_Next move:_ ${g.suggested_next_move}` : "",
- ].filter(Boolean).join("\n");
+ ].join("\n");
}).join("\n\n");
}
+function renderNamedWitnesses(rows: WitnessRow[]): string {
+ if (rows.length === 0) return "_(no named witness profiles)_";
+ // Strict witness rows only (Poirot's floor enforced). Pass canonical name +
+ // verdict so the narrator can introduce them. No "credibility" framing.
+ return rows.map((w) => [
+ `### ${w.canonical_name ?? "—"}`,
+ w.verdict ? `Profile: ${w.verdict}` : "",
+ ].filter(Boolean).join("\n")).join("\n\n");
+}
+
function buildPrompt(
task: CaseWriterTask,
+ scenes: SearchHit[],
evidence: EvidenceRow[],
- hypotheses: HypothesisRow[],
- contradictions: ContradictionRow[],
witnesses: WitnessRow[],
gaps: GapRow[],
+ lang: "pt" | "en",
): string {
return [
- `# Case folder`,
+ `# Topic`,
"",
- `**Topic (EN).** ${task.topic}`,
- `**Tópico (PT-BR).** ${task.topic_pt_br ?? task.topic}`,
+ `**EN.** ${task.topic}`,
+ `**PT-BR.** ${task.topic_pt_br ?? task.topic}`,
+ task.doc_id ? `\nScoped to document: ${task.doc_id}` : "",
"",
- task.doc_id ? `Scoped to document: ${task.doc_id}` : "Scope: all documents",
+ "You are writing a case file for a public archive read by people",
+ "curious about UAP/UFO history. Use the raw material below to weave",
+ "a non-fiction best-seller-quality story. Do not name any internal",
+ "process or source-of-reasoning. Tell what happened.",
"",
- "**Bilingual output mandatory.** Write each act in BOTH English and",
- "Brazilian Portuguese (PT-BR), interleaved per the system-prompt",
- "structure. UTF-8 accents preserved. Verbatim chunk quotes stay in",
- "their source language; only the surrounding narration is translated.",
+ "## Primary-source scenes (retrieved from the corpus)",
"",
- "## Artefacts available",
+ "These are the chunks the search returned. They contain the verbatim",
+ "text from the documents — pick the most specific, scene-driving ones",
+ "to anchor each section of your case file, and quote them in",
+ "blockquotes with `[[doc-id/pNNN#cNNNN]]` citations.",
"",
- `### Evidence (E-NNNN) · ${evidence.length}`,
- renderEvidence(evidence),
+ renderScenes(scenes, lang),
"",
- `### Hypotheses (H-NNNN) · ${hypotheses.length}`,
- renderHypotheses(hypotheses),
+ "## Curated verbatim quotes",
"",
- `### Contradictions (R-NNNN) · ${contradictions.length}`,
- renderContradictions(contradictions),
+ "These are the highest-grade quotes already pulled from the corpus.",
+ "Use them as load-bearing blockquotes in your scenes.",
"",
- `### Witness analyses (W-NNNN) · ${witnesses.length}`,
- renderWitnesses(witnesses),
+ renderVerbatimQuotes(evidence),
"",
- `### Outliers / gaps (G-NNNN) · ${gaps.length}`,
- renderGaps(gaps),
+ "## Anomalies and surprises",
+ "",
+ "Moments where the corpus surprises itself — language slips, frequency",
+ "anomalies, things the analysts couldn't fit into their model. Strong",
+ "material for the closing of a section.",
+ "",
+ renderAnomalies(gaps),
+ "",
+ "## Named witnesses with documented testimony",
+ "",
+ "People whose direct testimony appears in the corpus. Introduce them",
+ "in scene, not as a list.",
+ "",
+ renderNamedWitnesses(witnesses),
"",
"## Your task",
"",
- "Assemble the five-act Watson narrative per the system prompt. Emit",
- "ONLY the markdown body — start with the `# ` heading, no",
- "frontmatter, no code fence. If the artefacts are too thin, emit",
- "`INSUFFICIENT_ARTEFACTS` and stop.",
- ].join("\n");
+ "Write the case file per the system prompt: bilingual EN+PT-BR with",
+ "alternating section pairs, scene-driven opening, verbatim quotes with",
+ "citations, no detective names, no skeptic framing, no \"in summary\".",
+ "Emit ONLY the markdown body starting with `# `. If the raw",
+ "material is too thin, emit `INSUFFICIENT_ARTEFACTS` and stop.",
+ ].filter(Boolean).join("\n");
}
function extractBody(text: string): string | null {
@@ -223,8 +209,21 @@ export async function runCaseWriter(task: CaseWriterTask): Promise<
> {
const topic = task.topic.trim();
const slug = task.slug ?? topicSlug(topic);
+ const lang: "pt" | "en" = task.lang ?? "pt";
const filter = `%${topic.toLowerCase()}%`;
+
+ // Grounding pass — retrieve top scenes from the corpus via hybrid_search.
+ // This is what gives the narrator real verbatim material to weave. Without
+ // this, the case-writer only sees pre-digested artefacts (which is what
+ // produced the academic prose in v1).
+ const scenes = await hybridSearch({
+ query: topic, lang,
+ doc_id: task.doc_id ?? null,
+ top_k: 18,
+ recall_k: 80,
+ max_dense_dist: 0.55,
+ }).catch(() => [] as SearchHit[]);
const docIdFilter = task.doc_id ?? null;
// Pull artefacts SEQUENTIALLY. The investigator role has rolconnlimit=4 and
@@ -246,21 +245,9 @@ export async function runCaseWriter(task: CaseWriterTask): Promise<
ORDER BY e.evidence_id LIMIT 20`,
[docIdFilter ?? filter],
);
- const hypotheses = await query(
- `SELECT hypothesis_id, question, position, argument_for, argument_against,
- prior, posterior, confidence_band, status, reviewed_by
- FROM public.hypotheses
- WHERE LOWER(question) LIKE $1 OR LOWER(position) LIKE $1
- ORDER BY hypothesis_id LIMIT 12`,
- [filter],
- );
- const contradictions = await query(
- `SELECT contradiction_id, topic, chunks, resolution_status, notes
- FROM public.contradictions
- WHERE LOWER(topic) LIKE $1
- ORDER BY contradiction_id LIMIT 8`,
- [filter],
- );
+ // Hypotheses + contradictions are no longer fed to the narrator. They were
+ // skeptic-framing scaffolding from the earlier bureau. The narrator works
+ // from corpus scenes + curated verbatim quotes instead.
const witnesses = await query(
`SELECT w.witness_id, e.canonical_name, w.credibility, w.verdict,
w.access_to_event, w.bias_notes
@@ -286,21 +273,20 @@ export async function runCaseWriter(task: CaseWriterTask): Promise<
job_id: task.job_id,
detective: "case-writer@detective",
topic, slug, doc_id: docIdFilter,
+ n_scenes: scenes.length,
n_evidence: evidence.length,
- n_hypotheses: hypotheses.length,
- n_contradictions: contradictions.length,
n_witnesses: witnesses.length,
n_gaps: gaps.length,
});
- const total = evidence.length + hypotheses.length + contradictions.length
- + witnesses.length + gaps.length;
- if (total < 2 || (evidence.length === 0 && hypotheses.length === 0)) {
- return { skipped: true, reason: "insufficient_artefacts" };
+ // Refusal floor: the narrator needs real corpus material. Without enough
+ // scenes (chunks) the file would be padding.
+ if (scenes.length < 4) {
+ return { skipped: true, reason: `insufficient_scenes_${scenes.length}_of_4` };
}
const systemPrompt = await readFile(PROMPT_PATH, "utf-8");
- const prompt = buildPrompt(task, evidence, hypotheses, contradictions, witnesses, gaps);
+ const prompt = buildPrompt(task, scenes, evidence, witnesses, gaps, lang);
// Case-writer wants more output budget than the other detectives.
const llm = await callClaude({
@@ -328,14 +314,14 @@ export async function runCaseWriter(task: CaseWriterTask): Promise<
topic, topic_pt_br: task.topic_pt_br, slug, body_md,
meta: {
n_evidence: evidence.length,
- n_hypotheses: hypotheses.length,
- n_contradictions: contradictions.length,
+ n_hypotheses: 0, // hypothesis tournaments no longer feed the narrative
+ n_contradictions: 0,
n_witnesses: witnesses.length,
n_outliers: gaps.filter((g) => {
const s = g.scope as Record | null;
return s?.kind === "outlier";
}).length,
- n_calibrations: 0, // Calibrations live inside hypothesis case files, not a table yet.
+ n_calibrations: 0,
},
}, { job_id: task.job_id, detective: "case-writer@detective" });
}