W4 followup: Poirot soft-truncate at sentence boundary
Live PT-BR smoke on j-edgar-hoover produced verdict_pt_br at 304 chars
(prompt says ≤ 280). The writer correctly rejected it ("verdict too long
(304 > 280)") but the job failed instead of trimming.
Fix: detective now trims each language field at the nearest sentence
boundary (period or semicolon) above 60% of the cap; falls back to a hard
cut at the cap. Applied to verdict / verdict_pt_br (≤280), and to
access_to_event*, bias_notes* (≤800) for defense in depth.
The contract with the writer stays strict; the detective just becomes
forgiving about the model going 5-10% over.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
7826710051
commit
0a5c03c29a
1 changed files with 17 additions and 0 deletions
|
|
@ -244,6 +244,23 @@ export async function runPoirot(task: PoirotTask): Promise<
|
||||||
return { skipped: true, reason: "incomplete_bilingual_analysis" };
|
return { skipped: true, reason: "incomplete_bilingual_analysis" };
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Soft-truncate before sending to the writer: the prompt asks ≤ 280 chars
|
||||||
|
// per language but the model occasionally goes slightly over (304 chars
|
||||||
|
// observed live with j-edgar-hoover PT-BR). Truncate at sentence boundary
|
||||||
|
// when possible, else at the cap.
|
||||||
|
const trimTo = (s: string, max: number): string => {
|
||||||
|
if (s.length <= max) return s;
|
||||||
|
const cut = s.slice(0, max);
|
||||||
|
const lastPeriod = Math.max(cut.lastIndexOf(". "), cut.lastIndexOf("; "));
|
||||||
|
return (lastPeriod > max * 0.6 ? cut.slice(0, lastPeriod + 1) : cut).trim();
|
||||||
|
};
|
||||||
|
args.verdict = trimTo(args.verdict, 280);
|
||||||
|
args.verdict_pt_br = trimTo(args.verdict_pt_br, 280);
|
||||||
|
args.access_to_event = trimTo(args.access_to_event, 800);
|
||||||
|
args.access_to_event_pt_br = trimTo(args.access_to_event_pt_br, 800);
|
||||||
|
args.bias_notes = trimTo(args.bias_notes, 800);
|
||||||
|
args.bias_notes_pt_br = trimTo(args.bias_notes_pt_br, 800);
|
||||||
|
|
||||||
// Pass the shortlist's most-represented doc_id as a fallback for chunk_id
|
// Pass the shortlist's most-represented doc_id as a fallback for chunk_id
|
||||||
// resolution in case the model emits a bare "c0042" without doc_id.
|
// resolution in case the model emits a bare "c0042" without doc_id.
|
||||||
const docCount = new Map<string, number>();
|
const docCount = new Map<string, number>();
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue