disclosure-bureau/case/gaps/G-0002.md
Luiz Gustavo a7e9dce6d2 rebuild entity layer from Sonnet-vision reextract pipeline
Add reextract pipeline (scripts/reextract/) that rebuilds doc-level entity
JSON from Sonnet-vision chunks via Opus, replacing the noisy per-page
extraction. Add synthesize scripts to regenerate wiki/entities from the 116
_reextract.json (30), aggregate missing page.md from chunks (31), and reprocess
805 pages the doc-rebuilder agent dropped on context overflow (32). Add
maintain scripts 43-56 for chunk-page sync, dedup, generic-entity marking, and
typed relation extraction.

Web: wire relations API + entity-relations component; entity/timeline/doc
pages consume the rebuilt layer.

Note: raw/, processing/, wiki/ remain gitignored (bulk data managed
separately); the 116 reextract JSONs and 7,798 rebuilt entity files live on
disk only. The 27 curated anchor events under wiki/entities/events/ are
preserved.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 12:20:24 -03:00

2.6 KiB

schema_version type gap_id canonical_title gap_class description description_pt_br detected_in detected_by detected_at severity investigative_impact investigative_impact_pt_br possible_explanations recommended_actions related_gaps wiki_version
0.1.0 gap G-0002 Mismatch between internal title (D31) and filename (D54) in DOW-UAP-D54 inconsistency The PDF `DOW-UAP-D54-Mission-Report-Mediterranean-Sea-NA.pdf` carries in its PDF metadata 'Title' field the value "DoW-UAP-D31", while its external filename (published on war.gov/ufo) uses the identifier "D54". This may indicate: (a) editorial renumbering between versions — the document was originally "D31" during preparation and renumbered to "D54" at release; (b) copy/paste error in the release template; (c) a separate "D31" document exists whose title was reused by mistake. O PDF `DOW-UAP-D54-Mission-Report-Mediterranean-Sea-NA.pdf` carrega no campo PDF metadata 'Title' o valor "DoW-UAP-D31", enquanto seu nome externo de arquivo (publicado em war.gov/ufo) usa o identificador "D54". Isso pode indicar: (a) renumeração editorial entre versões — o documento foi originalmente "D31" durante a preparação e renumerado para "D54" no release; (b) erro de copy/paste no template de release; (c) existe um documento "D31" separado cujo título foi reusado por engano.
dow-uap-d54-mission-report-mediterranean-sea-na
archivist 2026-05-13T08:50:00Z low Does not affect substantive content (the page-7 UAP observation is independent of the report number). But raises doubt about whether a separate "DoW-UAP-D31" file exists in the corpus, and about the integrity of the release process. Não afeta o conteúdo substantivo (a observação UAP da página 7 é independente do número do relatório). Mas levanta dúvida sobre se existe um arquivo "DoW-UAP-D31" separado no corpus, e sobre a integridade do processo de release.
explanation confidence_band
Editorial renumbering — D31 was internal name, D54 is public ID medium
explanation confidence_band
Copy-paste error of title from another template document medium
explanation confidence_band
A separate D31 exists and this D54 inherited its title by mistake low
Check whether a separate DOW-UAP-D31 exists in the war.gov/ufo corpus
Cross-check internal titles of other DOW-UAP-D* documents to detect a pattern
Compare against AARO's official index if available
gap/G-0001
0.1.0

Gap G-0002 — Internal identifier vs filename mismatch

See description / description_pt_br.