10 KiB
CLAUDE.md — Contrato Vinculante da Wiki UFO/UAP
Versão
0.1.0· Última atualização2026-05-13· Schema canônico emCLAUDE-schema-full.md
Toda agente que tocar este projeto lê este arquivo no boot. Operar lendo apenas este contrato é suficiente para tarefas correntes — detalhes de schema vivem em CLAUDE-schema-full.md.
1. Filosofia em uma frase
Wiki investigativa estilo Karpathy LLM Wiki + Investigation Bureau (8 detetives Holmes/Poirot/Dupin/Locard + Schneier/Tetlock/Taleb). Markdown puro, sem RAG, com procedência absoluta de cada claim.
2. Layout
/Users/guto/ufo/
├── CLAUDE.md ← este arquivo (contrato)
├── CLAUDE-schema-full.md ← schema completo dos 24 tipos
├── raw/ ← IMUTÁVEL (115 PDFs + 14 JPG/PNG)
├── processing/ ← intermediário (PNGs, OCR, vision raw)
├── wiki/ ← GERADO (documents, pages, entities, tables, images)
├── case/ ← Investigation Bureau (evidence, witnesses, hypotheses, ...)
└── scripts/ ← pipelines de ingest, dedup, lint
Regra de ouro: nada escreve em raw/. Referências usam path relativo ../raw/<file>.pdf.
3. Idioma — bilíngue EN + PT-BR (português brasileiro)
A wiki é bilíngue desde o ingest. A mesma chamada Haiku vision gera EN e PT-BR juntos (single pass, preserva contexto visual da imagem).
| Categoria de campo | Idioma |
|---|---|
| YAML keys | English (international standard) |
| OCR raw text | Source language only (verbatim, no translation) |
verbatim_excerpt (evidence), verbatim_quotes (person), caption_ocr (image) |
Source language only |
Enums (page_type, content_classification, evidence_grade, confidence_band, redaction codes, classification markings) |
English (universal) |
canonical_name, technical IDs |
Source language; aliases array can hold PT-BR forms |
Narrative descriptions (vision_description, narrative_summary, executive_summary, description in gaps, definition_short in concepts, verdict_rationale in witnesses) |
Both EN and PT-BR via sibling fields vision_description + vision_description_pt_br etc. |
| Markdown body sections (headings + commentary) | Both EN and PT-BR in adjacent sections: ## Vision Description (EN) then ## Descrição Vision (PT-BR) |
PT-BR rules:
- Must be Brazilian Portuguese (
pt-br), NOT European Portuguese. Use Brazilian vocabulary and spelling. - Preserve UTF-8 accents correctly:
ç,ã,á,é,í,ó,ú,â,ê,ô,à. Never strip accents. - When a verbatim quote from the document appears inside a narrative paragraph, keep the quote in source language and translate only the surrounding narration.
- IDs always ASCII-fold (kebab-case without accents). Display fields (
canonical_name) preserve accents when applicable.
Encoding: always UTF-8.
4. Os 24 tipos de markdown
| Tipo | Caminho | Owner |
|---|---|---|
document |
wiki/documents/<doc-id>.md |
archivist |
page |
wiki/pages/<doc-id>/p<NNN>.md |
archivist + evidence-officer |
person |
wiki/entities/people/<id>.md |
profiler |
organization |
wiki/entities/organizations/<id>.md |
profiler |
location |
wiki/entities/locations/<id>.md |
archivist |
event |
wiki/entities/events/<id>.md |
timeline-analyst |
uap_object |
wiki/entities/uap-objects/<id>.md |
evidence-officer |
vehicle |
wiki/entities/vehicles/<id>.md |
archivist |
operation |
wiki/entities/operations/<id>.md |
archivist |
concept |
wiki/entities/concepts/<id>.md |
archivist |
table |
wiki/tables/<table-id>.md |
archivist |
image |
wiki/images/<image-id>.md |
evidence-officer |
evidence |
case/evidence/<E-NNNN>.md |
evidence-officer |
witness_analysis |
case/witnesses/<W-NNNN>.md |
witness-officer |
timeline |
case/timelines/<scope>.md |
timeline-analyst |
hypothesis |
case/hypotheses/<H-NNNN>.md |
hypothesis-lead |
actor_profile |
case/profiles/<AP-NNNN>.md |
profiler |
gap |
case/gaps/<G-NNNN>.md |
archivist + chief-detective |
relation |
case/connect-the-dots/<R-NNNN>.md |
chief-detective |
case_report |
case/case-report.md |
case-writer |
residual_uncertainty |
case/residual-uncertainty.md |
chief-detective |
index |
wiki/index.md |
archivist |
log |
wiki/log.md |
archivist (append-only) |
| (este) | CLAUDE.md |
chief-detective |
Schemas de frontmatter detalhados em CLAUDE-schema-full.md.
5. Frontmatter obrigatório universal
Todo arquivo .md em wiki/ e case/ tem:
---
schema_version: "0.1.0"
type: <enum> # document | page | person | ... (24 tipos)
canonical_title: "..." # OU canonical_name (entidades)
wiki_version: "0.1.0"
last_ingest: "2026-05-13T14:22:11Z" # OU last_revised
---
6. Naming canônico (regex)
| Tipo | Regex | Exemplo |
|---|---|---|
doc_id |
^[a-z0-9][a-z0-9-]*$ |
dow-uap-d54-mission-report-mediterranean-sea-na |
page_id |
^[a-z0-9-]+/p\d{3}$ |
dow-uap-d54-.../p007 |
person_id |
^[a-z][a-z0-9-]*$ (ASCII-fold) |
j-edgar-hoover |
event_id |
^EV-\d{4}-(\d{2}|XX)-(\d{2}|XX)-[a-z0-9-]+$ |
EV-2004-11-14-tic-tac-nimitz |
uap_object_id |
^OBJ-[A-Z0-9-]+-\d{2}$ |
OBJ-EV2004-NIMITZ-01 |
evidence_id |
^E-\d{4}$ |
E-0042 |
witness_id |
^W-\d{4}$ |
W-0007 |
hypothesis_id |
^H-\d{4}$ |
H-0003 |
table_id |
^TBL-[A-Z0-9]+-\d{4}$ |
TBL-DOWD54-0003 |
image_id |
^IMG-[A-Z0-9]+-p\d{3}-\d{2}$ |
IMG-DOWD54-p007-01 |
gap_id |
^G-\d{4}$ |
G-0012 |
relation_id |
^R-\d{4}$ |
R-0028 |
actor_profile_id |
^AP-\d{4}$ |
AP-0001 |
Algoritmo filename → doc_id
1. Strip extension (.pdf, .jpg, .png)
2. NFD + remove combining marks (ASCII fold)
3. Lowercase
4. Replace whitespace/underscore/non-[a-z0-9-] com "-"
5. Collapse "-" repetidos
6. Trim "-" inicial/final
7. Se começa com dígito, prefixa "doc-"
7. Wiki-links — 18 namespaces
[[doc-id]] → wiki/documents/<doc-id>.md
[[doc-id/pNNN]] → wiki/pages/<doc-id>/p<NNN>.md
[[people/<id>]] → wiki/entities/people/<id>.md
[[org/<id>]] → wiki/entities/organizations/<id>.md
[[loc/<id>]] → wiki/entities/locations/<id>.md
[[event/<id>]] → wiki/entities/events/<id>.md
[[uap/<id>]] → wiki/entities/uap-objects/<id>.md
[[vehicle/<id>]] → wiki/entities/vehicles/<id>.md
[[op/<id>]] → wiki/entities/operations/<id>.md
[[concept/<id>]] → wiki/entities/concepts/<id>.md
[[table/<id>]] [[image/<id>]] → wiki/tables|images/<id>.md
[[evidence/<id>]] [[witness/<id>]]
[[hypothesis/<id>]] [[profile/<id>]]
[[gap/<id>]] [[relation/<id>]] → case/...
[[people/...|Grusch]] → custom display text
Backlinks (mentioned_in[] em entidades) são materializados pelo Lint, NÃO escritos à mão.
8. Confidence calibration (Tetlock)
| Banda | Faixa | Linguagem permitida |
|---|---|---|
high |
≥0.90 | "demonstra", "estabelece" |
medium |
0.60–0.89 | "sugere fortemente", "indica" |
low |
0.30–0.59 | "possivelmente", "pode" |
speculation |
<0.30 | "hipótese", "especulação" — sempre rotulado |
Toda claim em sumário executivo carrega confidence_band.
9. Classificação de conteúdo (content_classification)
Array enum em document e page:
text-only·contains-photos·contains-sketches·contains-diagrams·contains-maps·contains-tables·contains-signatures·contains-stamps·redaction-heavy(>30% redacted) ·mixed·blank
Doc-level = união dos valores das páginas.
10. Procedência (Locard)
- Toda
evidenceapontasource_page+bbox(opcional). - Toda claim em entidade tem
mentioned_in[]compage_ref. chain_of_custody[]obrigatório em evidence;custody_gaps[]explícitos.- Grade A → ≥3 custody steps · Grade B → ≥2 · Grade C → ≥1
11. Operações canônicas
- INGEST — PDF → PNG por página → vision Haiku →
page.md+ entity upsert - LINT — scan reverso, materializa
mentioned_in[], valida wiki-links, reporta orphans - QUERY — leitura por wiki-link traversal; nunca via embeddings
Log toda operação em wiki/log.md (append-only, formato fixo).
12. Quality gates (chief-detective enforça)
Threshold global 0.85 em 6 rubrics no case-report.md:
chain_of_custody_completenessconfidence_calibration_matchhypothesis_tournament_discipline(≥3 hipóteses)residual_uncertainty_presenceaudit_trail_per_claimred_team_pass
Lint adicional bloqueante:
- Wiki-links resolvem 100%
entity.mentioned_in↔page.entities_extractedconsistente- Nenhum
canonical_nameduplicado semdisambiguation_note pages[]contínuo1..page_countpor documento
13. Triggers de enrichment externo
- ≥3 menções OU central claim →
enrichment_status: deep(WebSearch + ≥2external_sources) - 1-2 menções →
enrichment_status: shallow(1 query + knowledge interno) - 0 menções (inferida) →
enrichment_status: none
14. Idempotência
Re-ingest do mesmo PDF (mesmo sha256) atualiza last_ingest, preserva created_at. Re-lint sobrescreve mentioned_in[] mas não duplica.
15. Escalation
Agente encontra:
- Contradição entre evidências grade A/B → escalar
chief-detective - Hypothesis sobrevivente com posterior >0.70 → revisão multi-detective
- Gap critical → criar
[[gap/G-NNNN]]+ linkar emcase-report
16. Modelo
Default para ingest, vision, dedup, lint, enrichment, e geração de markdown: claude-haiku-4-5.
case-writer (narrativa Holmes-Watson final) e chief-detective (red team review) podem opcionalmente usar Sonnet para qualidade final.
17. Stack de execução
- PDF → PNG:
pdftoppm -r 200(Poppler) - PDF → texto:
pdftotext -layout - Vision: Anthropic SDK Python + Haiku, com prompt caching e
pdf-2025-03-04beta header se aplicável - Linting: Python (PyYAML + regex)
Scripts em /Users/guto/ufo/scripts/.