disclosure-bureau/web/app
Luiz Gustavo e75ca5eda2 add clean LLM reading version of documents (the core goal)
Scanned docs are messy — duplicate transcriptions (typed + handwritten),
two classification variants of the same narrative, OCR noise, repeated
banners. The doc page showed raw chunks, so everything appeared twice.

40_reading_version.py generates ONE clean, deduplicated, well-structured
bilingual Markdown reading version per doc (Sonnet): merges duplicate versions
without losing unique lines, drops page furniture, formats transcripts as
dialogue. Faithful — invents nothing; redactions kept as markers.

/d/[docId] now defaults to a "📖 leitura" tab rendering this clean version,
with "🔍 trechos · scan original" preserving the faithful per-chunk + per-page
scan view. reading.md lives in raw/<doc>--subagent/ alongside the chunks.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 17:23:36 -03:00
..
admin baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00
api rebuild entity layer from Sonnet-vision reextract pipeline 2026-05-21 12:20:24 -03:00
auth ship: synthesize 158 entities, AG-UI artifacts, chat persistence, auth flow 2026-05-18 03:52:59 -03:00
d/[docId] add clean LLM reading version of documents (the core goal) 2026-05-21 17:23:36 -03:00
e/[cls] rebuild entity layer from Sonnet-vision reextract pipeline 2026-05-21 12:20:24 -03:00
graph baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00
search baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00
timeline baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00
globals.css baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00
layout.tsx baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00
page.tsx baseline: Disclosure Bureau pipeline + Next.js UI + Supabase stack 2026-05-17 22:44:36 -03:00