disclosure-bureau/web/lib/retrieval
Luiz Gustavo 4865f974b6 fix search: rerank-gate results so absent terms return nothing
The hybrid_search RPC always returns up to recall_k dense neighbours, so a
query for a term absent from the corpus (e.g. "varginha") returned its 12
nearest vectors — irrelevant chunks like PAGE_NUMBER "1". Two bugs:
the reranker was skipped whenever results <= top_k, and there was no relevance
floor.

Now always run the cross-encoder reranker (BGE-reranker-v2-m3, normalized
sigmoid) and drop hits below 0.02. Verified: "varginha" → 0 results;
"roswell"/"tic tac"/"disco voador" → relevant hits on top (reranker cleanly
separates 0.0001 garbage from 0.03-0.27 matches).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 14:46:49 -03:00
..
db.ts baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00
embed.ts baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00
entity-pages.ts rebuild entity layer from Sonnet-vision reextract pipeline 2026-05-21 12:20:24 -03:00
graph.ts rebuild entity layer from Sonnet-vision reextract pipeline 2026-05-21 12:20:24 -03:00
hybrid.ts fix search: rerank-gate results so absent terms return nothing 2026-05-21 14:46:49 -03:00