# CLAUDE.md — Contrato Vinculante da Wiki UFO/UAP > Versão `0.1.0` · Última atualização `2026-05-13` · Schema canônico em [`CLAUDE-schema-full.md`](CLAUDE-schema-full.md) Toda agente que tocar este projeto **lê este arquivo no boot**. Operar lendo apenas este contrato é suficiente para tarefas correntes — detalhes de schema vivem em `CLAUDE-schema-full.md`. ## 1. Filosofia em uma frase Wiki investigativa estilo **Karpathy LLM Wiki** + **Investigation Bureau** (8 detetives Holmes/Poirot/Dupin/Locard + Schneier/Tetlock/Taleb). Markdown puro, sem RAG, com procedência absoluta de cada claim. ## 2. Layout ``` /Users/guto/ufo/ ├── CLAUDE.md ← este arquivo (contrato) ├── CLAUDE-schema-full.md ← schema completo dos 24 tipos ├── raw/ ← IMUTÁVEL (115 PDFs + 14 JPG/PNG) ├── processing/ ← intermediário (PNGs, OCR, vision raw) ├── wiki/ ← GERADO (documents, pages, entities, tables, images) ├── case/ ← Investigation Bureau (evidence, witnesses, hypotheses, ...) └── scripts/ ← pipelines de ingest, dedup, lint ``` **Regra de ouro:** nada escreve em `raw/`. Referências usam path relativo `../raw/.pdf`. ## 3. Idioma — bilíngue EN + PT-BR (português brasileiro) A wiki é **bilíngue** desde o ingest. A mesma chamada Haiku vision gera EN e PT-BR juntos (single pass, preserva contexto visual da imagem). | Categoria de campo | Idioma | |---|---| | YAML keys | **English** (international standard) | | OCR raw text | **Source language only** (verbatim, no translation) | | `verbatim_excerpt` (evidence), `verbatim_quotes` (person), `caption_ocr` (image) | **Source language only** | | Enums (`page_type`, `content_classification`, `evidence_grade`, `confidence_band`, redaction codes, classification markings) | **English** (universal) | | `canonical_name`, technical IDs | **Source language**; aliases array can hold PT-BR forms | | Narrative descriptions (`vision_description`, `narrative_summary`, `executive_summary`, `description` in gaps, `definition_short` in concepts, `verdict_rationale` in witnesses) | **Both EN and PT-BR** via sibling fields `vision_description` + `vision_description_pt_br` etc. | | Markdown body sections (headings + commentary) | **Both EN and PT-BR** in adjacent sections: `## Vision Description (EN)` then `## Descrição Vision (PT-BR)` | **PT-BR rules:** - Must be **Brazilian Portuguese** (`pt-br`), NOT European Portuguese. Use Brazilian vocabulary and spelling. - Preserve UTF-8 accents correctly: `ç`, `ã`, `á`, `é`, `í`, `ó`, `ú`, `â`, `ê`, `ô`, `à`. Never strip accents. - When a verbatim quote from the document appears inside a narrative paragraph, keep the **quote** in source language and translate only the surrounding narration. - IDs always ASCII-fold (kebab-case without accents). Display fields (`canonical_name`) preserve accents when applicable. Encoding: **always UTF-8**. ## 4. Os 24 tipos de markdown | Tipo | Caminho | Owner | |---|---|---| | `document` | `wiki/documents/.md` | archivist | | `page` | `wiki/pages//p.md` | archivist + evidence-officer | | `person` | `wiki/entities/people/.md` | profiler | | `organization` | `wiki/entities/organizations/.md` | profiler | | `location` | `wiki/entities/locations/.md` | archivist | | `event` | `wiki/entities/events/.md` | timeline-analyst | | `uap_object` | `wiki/entities/uap-objects/.md` | evidence-officer | | `vehicle` | `wiki/entities/vehicles/.md` | archivist | | `operation` | `wiki/entities/operations/.md` | archivist | | `concept` | `wiki/entities/concepts/.md` | archivist | | `table` | `wiki/tables/.md` | archivist | | `image` | `wiki/images/.md` | evidence-officer | | `evidence` | `case/evidence/.md` | evidence-officer | | `witness_analysis` | `case/witnesses/.md` | witness-officer | | `timeline` | `case/timelines/.md` | timeline-analyst | | `hypothesis` | `case/hypotheses/.md` | hypothesis-lead | | `actor_profile` | `case/profiles/.md` | profiler | | `gap` | `case/gaps/.md` | archivist + chief-detective | | `relation` | `case/connect-the-dots/.md` | chief-detective | | `case_report` | `case/case-report.md` | case-writer | | `residual_uncertainty` | `case/residual-uncertainty.md` | chief-detective | | `index` | `wiki/index.md` | archivist | | `log` | `wiki/log.md` | archivist (append-only) | | (este) | `CLAUDE.md` | chief-detective | Schemas de frontmatter detalhados em [`CLAUDE-schema-full.md`](CLAUDE-schema-full.md). ## 5. Frontmatter obrigatório universal Todo arquivo `.md` em `wiki/` e `case/` tem: ```yaml --- schema_version: "0.1.0" type: # document | page | person | ... (24 tipos) canonical_title: "..." # OU canonical_name (entidades) wiki_version: "0.1.0" last_ingest: "2026-05-13T14:22:11Z" # OU last_revised --- ``` ## 6. Naming canônico (regex) | Tipo | Regex | Exemplo | |---|---|---| | `doc_id` | `^[a-z0-9][a-z0-9-]*$` | `dow-uap-d54-mission-report-mediterranean-sea-na` | | `page_id` | `^[a-z0-9-]+/p\d{3}$` | `dow-uap-d54-.../p007` | | `person_id` | `^[a-z][a-z0-9-]*$` (ASCII-fold) | `j-edgar-hoover` | | `event_id` | `^EV-\d{4}-(\d{2}\|XX)-(\d{2}\|XX)-[a-z0-9-]+$` | `EV-2004-11-14-tic-tac-nimitz` | | `uap_object_id` | `^OBJ-[A-Z0-9-]+-\d{2}$` | `OBJ-EV2004-NIMITZ-01` | | `evidence_id` | `^E-\d{4}$` | `E-0042` | | `witness_id` | `^W-\d{4}$` | `W-0007` | | `hypothesis_id` | `^H-\d{4}$` | `H-0003` | | `table_id` | `^TBL-[A-Z0-9]+-\d{4}$` | `TBL-DOWD54-0003` | | `image_id` | `^IMG-[A-Z0-9]+-p\d{3}-\d{2}$` | `IMG-DOWD54-p007-01` | | `gap_id` | `^G-\d{4}$` | `G-0012` | | `relation_id` | `^R-\d{4}$` | `R-0028` | | `actor_profile_id` | `^AP-\d{4}$` | `AP-0001` | ### Algoritmo `filename → doc_id` ``` 1. Strip extension (.pdf, .jpg, .png) 2. NFD + remove combining marks (ASCII fold) 3. Lowercase 4. Replace whitespace/underscore/non-[a-z0-9-] com "-" 5. Collapse "-" repetidos 6. Trim "-" inicial/final 7. Se começa com dígito, prefixa "doc-" ``` ## 7. Wiki-links — 18 namespaces ``` [[doc-id]] → wiki/documents/.md [[doc-id/pNNN]] → wiki/pages//p.md [[people/]] → wiki/entities/people/.md [[org/]] → wiki/entities/organizations/.md [[loc/]] → wiki/entities/locations/.md [[event/]] → wiki/entities/events/.md [[uap/]] → wiki/entities/uap-objects/.md [[vehicle/]] → wiki/entities/vehicles/.md [[op/]] → wiki/entities/operations/.md [[concept/]] → wiki/entities/concepts/.md [[table/]] [[image/]] → wiki/tables|images/.md [[evidence/]] [[witness/]] [[hypothesis/]] [[profile/]] [[gap/]] [[relation/]] → case/... [[people/...|Grusch]] → custom display text ``` **Backlinks** (`mentioned_in[]` em entidades) são **materializados pelo Lint, NÃO escritos à mão**. ## 8. Confidence calibration (Tetlock) | Banda | Faixa | Linguagem permitida | |---|---|---| | `high` | ≥0.90 | "demonstra", "estabelece" | | `medium` | 0.60–0.89 | "sugere fortemente", "indica" | | `low` | 0.30–0.59 | "possivelmente", "pode" | | `speculation` | <0.30 | "hipótese", "especulação" — sempre rotulado | Toda claim em sumário executivo carrega `confidence_band`. ## 9. Classificação de conteúdo (`content_classification`) Array enum em `document` e `page`: - `text-only` · `contains-photos` · `contains-sketches` · `contains-diagrams` · `contains-maps` · `contains-tables` · `contains-signatures` · `contains-stamps` · `redaction-heavy` (>30% redacted) · `mixed` · `blank` Doc-level = união dos valores das páginas. ## 10. Procedência (Locard) - Toda `evidence` aponta `source_page` + `bbox` (opcional). - Toda claim em entidade tem `mentioned_in[]` com `page_ref`. - `chain_of_custody[]` obrigatório em evidence; `custody_gaps[]` explícitos. - Grade A → ≥3 custody steps · Grade B → ≥2 · Grade C → ≥1 ## 11. Operações canônicas 1. **INGEST** — PDF → PNG por página → vision Haiku → `page.md` + entity upsert 2. **LINT** — scan reverso, materializa `mentioned_in[]`, valida wiki-links, reporta orphans 3. **QUERY** — leitura por wiki-link traversal; nunca via embeddings Log toda operação em `wiki/log.md` (append-only, formato fixo). ## 12. Quality gates (chief-detective enforça) Threshold global **0.85** em 6 rubrics no `case-report.md`: 1. `chain_of_custody_completeness` 2. `confidence_calibration_match` 3. `hypothesis_tournament_discipline` (≥3 hipóteses) 4. `residual_uncertainty_presence` 5. `audit_trail_per_claim` 6. `red_team_pass` Lint adicional **bloqueante**: - Wiki-links resolvem 100% - `entity.mentioned_in` ↔ `page.entities_extracted` consistente - Nenhum `canonical_name` duplicado sem `disambiguation_note` - `pages[]` contínuo `1..page_count` por documento ## 13. Triggers de enrichment externo - **≥3 menções OU central claim** → `enrichment_status: deep` (WebSearch + ≥2 `external_sources`) - **1-2 menções** → `enrichment_status: shallow` (1 query + knowledge interno) - **0 menções** (inferida) → `enrichment_status: none` ## 14. Idempotência Re-ingest do mesmo PDF (mesmo `sha256`) atualiza `last_ingest`, preserva `created_at`. Re-lint sobrescreve `mentioned_in[]` mas não duplica. ## 15. Escalation Agente encontra: - **Contradição entre evidências grade A/B** → escalar `chief-detective` - **Hypothesis sobrevivente com posterior >0.70** → revisão multi-detective - **Gap critical** → criar `[[gap/G-NNNN]]` + linkar em `case-report` ## 16. Modelo Default para ingest, vision, dedup, lint, enrichment, e geração de markdown: **`claude-haiku-4-5`**. `case-writer` (narrativa Holmes-Watson final) e `chief-detective` (red team review) podem opcionalmente usar Sonnet para qualidade final. ## 17. Stack de execução - **PDF → PNG**: `pdftoppm -r 200` (Poppler) - **PDF → texto**: `pdftotext -layout` - **Vision**: Anthropic SDK Python + Haiku, com prompt caching e `pdf-2025-03-04` beta header se aplicável - **Linting**: Python (PyYAML + regex) Scripts em `/Users/guto/ufo/scripts/`.