Add reextract pipeline (scripts/reextract/) that rebuilds doc-level entity JSON from Sonnet-vision chunks via Opus, replacing the noisy per-page extraction. Add synthesize scripts to regenerate wiki/entities from the 116 _reextract.json (30), aggregate missing page.md from chunks (31), and reprocess 805 pages the doc-rebuilder agent dropped on context overflow (32). Add maintain scripts 43-56 for chunk-page sync, dedup, generic-entity marking, and typed relation extraction. Web: wire relations API + entity-relations component; entity/timeline/doc pages consume the rebuilt layer. Note: raw/, processing/, wiki/ remain gitignored (bulk data managed separately); the 116 reextract JSONs and 7,798 rebuilt entity files live on disk only. The 27 curated anchor events under wiki/entities/events/ are preserved. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
61 lines
2.6 KiB
Markdown
61 lines
2.6 KiB
Markdown
---
|
|
schema_version: "0.1.0"
|
|
type: gap
|
|
gap_id: "G-0002"
|
|
canonical_title: "Mismatch between internal title (D31) and filename (D54) in DOW-UAP-D54"
|
|
gap_class: inconsistency
|
|
|
|
description: |
|
|
The PDF `DOW-UAP-D54-Mission-Report-Mediterranean-Sea-NA.pdf` carries in its
|
|
PDF metadata 'Title' field the value "DoW-UAP-D31", while its external filename
|
|
(published on war.gov/ufo) uses the identifier "D54".
|
|
|
|
This may indicate:
|
|
(a) editorial renumbering between versions — the document was originally
|
|
"D31" during preparation and renumbered to "D54" at release;
|
|
(b) copy/paste error in the release template;
|
|
(c) a separate "D31" document exists whose title was reused by mistake.
|
|
|
|
description_pt_br: |
|
|
O PDF `DOW-UAP-D54-Mission-Report-Mediterranean-Sea-NA.pdf` carrega no campo
|
|
PDF metadata 'Title' o valor "DoW-UAP-D31", enquanto seu nome externo de
|
|
arquivo (publicado em war.gov/ufo) usa o identificador "D54".
|
|
|
|
Isso pode indicar:
|
|
(a) renumeração editorial entre versões — o documento foi originalmente
|
|
"D31" durante a preparação e renumerado para "D54" no release;
|
|
(b) erro de copy/paste no template de release;
|
|
(c) existe um documento "D31" separado cujo título foi reusado por engano.
|
|
|
|
detected_in:
|
|
- "[[dow-uap-d54-mission-report-mediterranean-sea-na]]"
|
|
detected_by: archivist
|
|
detected_at: "2026-05-13T08:50:00Z"
|
|
|
|
severity: low
|
|
investigative_impact: |
|
|
Does not affect substantive content (the page-7 UAP observation is independent
|
|
of the report number). But raises doubt about whether a separate "DoW-UAP-D31"
|
|
file exists in the corpus, and about the integrity of the release process.
|
|
investigative_impact_pt_br: |
|
|
Não afeta o conteúdo substantivo (a observação UAP da página 7 é independente
|
|
do número do relatório). Mas levanta dúvida sobre se existe um arquivo
|
|
"DoW-UAP-D31" separado no corpus, e sobre a integridade do processo de release.
|
|
|
|
possible_explanations:
|
|
- { explanation: "Editorial renumbering — D31 was internal name, D54 is public ID", confidence_band: medium }
|
|
- { explanation: "Copy-paste error of title from another template document", confidence_band: medium }
|
|
- { explanation: "A separate D31 exists and this D54 inherited its title by mistake", confidence_band: low }
|
|
|
|
recommended_actions:
|
|
- "Check whether a separate DOW-UAP-D31 exists in the war.gov/ufo corpus"
|
|
- "Cross-check internal titles of other DOW-UAP-D* documents to detect a pattern"
|
|
- "Compare against AARO's official index if available"
|
|
|
|
related_gaps: ["[[gap/G-0001]]"]
|
|
wiki_version: "0.1.0"
|
|
---
|
|
|
|
# Gap G-0002 — Internal identifier vs filename mismatch
|
|
|
|
See `description` / `description_pt_br`.
|