disclosure-bureau/CLAUDE.md

10 KiB
Raw Blame History

CLAUDE.md — Contrato Vinculante da Wiki UFO/UAP

Versão 0.1.0 · Última atualização 2026-05-13 · Schema canônico em CLAUDE-schema-full.md

Toda agente que tocar este projeto lê este arquivo no boot. Operar lendo apenas este contrato é suficiente para tarefas correntes — detalhes de schema vivem em CLAUDE-schema-full.md.

1. Filosofia em uma frase

Wiki investigativa estilo Karpathy LLM Wiki + Investigation Bureau (8 detetives Holmes/Poirot/Dupin/Locard + Schneier/Tetlock/Taleb). Markdown puro, sem RAG, com procedência absoluta de cada claim.

2. Layout

/Users/guto/ufo/
├── CLAUDE.md                 ← este arquivo (contrato)
├── CLAUDE-schema-full.md     ← schema completo dos 24 tipos
├── raw/                      ← IMUTÁVEL (115 PDFs + 14 JPG/PNG)
├── processing/               ← intermediário (PNGs, OCR, vision raw)
├── wiki/                     ← GERADO (documents, pages, entities, tables, images)
├── case/                     ← Investigation Bureau (evidence, witnesses, hypotheses, ...)
└── scripts/                  ← pipelines de ingest, dedup, lint

Regra de ouro: nada escreve em raw/. Referências usam path relativo ../raw/<file>.pdf.

3. Idioma — bilíngue EN + PT-BR (português brasileiro)

A wiki é bilíngue desde o ingest. A mesma chamada Haiku vision gera EN e PT-BR juntos (single pass, preserva contexto visual da imagem).

Categoria de campo Idioma
YAML keys English (international standard)
OCR raw text Source language only (verbatim, no translation)
verbatim_excerpt (evidence), verbatim_quotes (person), caption_ocr (image) Source language only
Enums (page_type, content_classification, evidence_grade, confidence_band, redaction codes, classification markings) English (universal)
canonical_name, technical IDs Source language; aliases array can hold PT-BR forms
Narrative descriptions (vision_description, narrative_summary, executive_summary, description in gaps, definition_short in concepts, verdict_rationale in witnesses) Both EN and PT-BR via sibling fields vision_description + vision_description_pt_br etc.
Markdown body sections (headings + commentary) Both EN and PT-BR in adjacent sections: ## Vision Description (EN) then ## Descrição Vision (PT-BR)

PT-BR rules:

  • Must be Brazilian Portuguese (pt-br), NOT European Portuguese. Use Brazilian vocabulary and spelling.
  • Preserve UTF-8 accents correctly: ç, ã, á, é, í, ó, ú, â, ê, ô, à. Never strip accents.
  • When a verbatim quote from the document appears inside a narrative paragraph, keep the quote in source language and translate only the surrounding narration.
  • IDs always ASCII-fold (kebab-case without accents). Display fields (canonical_name) preserve accents when applicable.

Encoding: always UTF-8.

4. Os 24 tipos de markdown

Tipo Caminho Owner
document wiki/documents/<doc-id>.md archivist
page wiki/pages/<doc-id>/p<NNN>.md archivist + evidence-officer
person wiki/entities/people/<id>.md profiler
organization wiki/entities/organizations/<id>.md profiler
location wiki/entities/locations/<id>.md archivist
event wiki/entities/events/<id>.md timeline-analyst
uap_object wiki/entities/uap-objects/<id>.md evidence-officer
vehicle wiki/entities/vehicles/<id>.md archivist
operation wiki/entities/operations/<id>.md archivist
concept wiki/entities/concepts/<id>.md archivist
table wiki/tables/<table-id>.md archivist
image wiki/images/<image-id>.md evidence-officer
evidence case/evidence/<E-NNNN>.md evidence-officer
witness_analysis case/witnesses/<W-NNNN>.md witness-officer
timeline case/timelines/<scope>.md timeline-analyst
hypothesis case/hypotheses/<H-NNNN>.md hypothesis-lead
actor_profile case/profiles/<AP-NNNN>.md profiler
gap case/gaps/<G-NNNN>.md archivist + chief-detective
relation case/connect-the-dots/<R-NNNN>.md chief-detective
case_report case/case-report.md case-writer
residual_uncertainty case/residual-uncertainty.md chief-detective
index wiki/index.md archivist
log wiki/log.md archivist (append-only)
(este) CLAUDE.md chief-detective

Schemas de frontmatter detalhados em CLAUDE-schema-full.md.

5. Frontmatter obrigatório universal

Todo arquivo .md em wiki/ e case/ tem:

---
schema_version: "0.1.0"
type: <enum>                         # document | page | person | ... (24 tipos)
canonical_title: "..."               # OU canonical_name (entidades)
wiki_version: "0.1.0"
last_ingest: "2026-05-13T14:22:11Z"  # OU last_revised
---

6. Naming canônico (regex)

Tipo Regex Exemplo
doc_id ^[a-z0-9][a-z0-9-]*$ dow-uap-d54-mission-report-mediterranean-sea-na
page_id ^[a-z0-9-]+/p\d{3}$ dow-uap-d54-.../p007
person_id ^[a-z][a-z0-9-]*$ (ASCII-fold) j-edgar-hoover
event_id ^EV-\d{4}-(\d{2}|XX)-(\d{2}|XX)-[a-z0-9-]+$ EV-2004-11-14-tic-tac-nimitz
uap_object_id ^OBJ-[A-Z0-9-]+-\d{2}$ OBJ-EV2004-NIMITZ-01
evidence_id ^E-\d{4}$ E-0042
witness_id ^W-\d{4}$ W-0007
hypothesis_id ^H-\d{4}$ H-0003
table_id ^TBL-[A-Z0-9]+-\d{4}$ TBL-DOWD54-0003
image_id ^IMG-[A-Z0-9]+-p\d{3}-\d{2}$ IMG-DOWD54-p007-01
gap_id ^G-\d{4}$ G-0012
relation_id ^R-\d{4}$ R-0028
actor_profile_id ^AP-\d{4}$ AP-0001

Algoritmo filename → doc_id

1. Strip extension (.pdf, .jpg, .png)
2. NFD + remove combining marks (ASCII fold)
3. Lowercase
4. Replace whitespace/underscore/non-[a-z0-9-] com "-"
5. Collapse "-" repetidos
6. Trim "-" inicial/final
7. Se começa com dígito, prefixa "doc-"
[[doc-id]]                       → wiki/documents/<doc-id>.md
[[doc-id/pNNN]]                  → wiki/pages/<doc-id>/p<NNN>.md
[[people/<id>]]                  → wiki/entities/people/<id>.md
[[org/<id>]]                     → wiki/entities/organizations/<id>.md
[[loc/<id>]]                     → wiki/entities/locations/<id>.md
[[event/<id>]]                   → wiki/entities/events/<id>.md
[[uap/<id>]]                     → wiki/entities/uap-objects/<id>.md
[[vehicle/<id>]]                 → wiki/entities/vehicles/<id>.md
[[op/<id>]]                      → wiki/entities/operations/<id>.md
[[concept/<id>]]                 → wiki/entities/concepts/<id>.md
[[table/<id>]] [[image/<id>]]    → wiki/tables|images/<id>.md
[[evidence/<id>]] [[witness/<id>]]
[[hypothesis/<id>]] [[profile/<id>]]
[[gap/<id>]] [[relation/<id>]]   → case/...
[[people/...|Grusch]]            → custom display text

Backlinks (mentioned_in[] em entidades) são materializados pelo Lint, NÃO escritos à mão.

8. Confidence calibration (Tetlock)

Banda Faixa Linguagem permitida
high ≥0.90 "demonstra", "estabelece"
medium 0.600.89 "sugere fortemente", "indica"
low 0.300.59 "possivelmente", "pode"
speculation <0.30 "hipótese", "especulação" — sempre rotulado

Toda claim em sumário executivo carrega confidence_band.

9. Classificação de conteúdo (content_classification)

Array enum em document e page:

  • text-only · contains-photos · contains-sketches · contains-diagrams · contains-maps · contains-tables · contains-signatures · contains-stamps · redaction-heavy (>30% redacted) · mixed · blank

Doc-level = união dos valores das páginas.

10. Procedência (Locard)

  • Toda evidence aponta source_page + bbox (opcional).
  • Toda claim em entidade tem mentioned_in[] com page_ref.
  • chain_of_custody[] obrigatório em evidence; custody_gaps[] explícitos.
  • Grade A → ≥3 custody steps · Grade B → ≥2 · Grade C → ≥1

11. Operações canônicas

  1. INGEST — PDF → PNG por página → vision Haiku → page.md + entity upsert
  2. LINT — scan reverso, materializa mentioned_in[], valida wiki-links, reporta orphans
  3. QUERY — leitura por wiki-link traversal; nunca via embeddings

Log toda operação em wiki/log.md (append-only, formato fixo).

12. Quality gates (chief-detective enforça)

Threshold global 0.85 em 6 rubrics no case-report.md:

  1. chain_of_custody_completeness
  2. confidence_calibration_match
  3. hypothesis_tournament_discipline (≥3 hipóteses)
  4. residual_uncertainty_presence
  5. audit_trail_per_claim
  6. red_team_pass

Lint adicional bloqueante:

  • Wiki-links resolvem 100%
  • entity.mentioned_inpage.entities_extracted consistente
  • Nenhum canonical_name duplicado sem disambiguation_note
  • pages[] contínuo 1..page_count por documento

13. Triggers de enrichment externo

  • ≥3 menções OU central claimenrichment_status: deep (WebSearch + ≥2 external_sources)
  • 1-2 mençõesenrichment_status: shallow (1 query + knowledge interno)
  • 0 menções (inferida) → enrichment_status: none

14. Idempotência

Re-ingest do mesmo PDF (mesmo sha256) atualiza last_ingest, preserva created_at. Re-lint sobrescreve mentioned_in[] mas não duplica.

15. Escalation

Agente encontra:

  • Contradição entre evidências grade A/B → escalar chief-detective
  • Hypothesis sobrevivente com posterior >0.70 → revisão multi-detective
  • Gap critical → criar [[gap/G-NNNN]] + linkar em case-report

16. Modelo

Default para ingest, vision, dedup, lint, enrichment, e geração de markdown: claude-haiku-4-5.

case-writer (narrativa Holmes-Watson final) e chief-detective (red team review) podem opcionalmente usar Sonnet para qualidade final.

17. Stack de execução

  • PDF → PNG: pdftoppm -r 200 (Poppler)
  • PDF → texto: pdftotext -layout
  • Vision: Anthropic SDK Python + Haiku, com prompt caching e pdf-2025-03-04 beta header se aplicável
  • Linting: Python (PyYAML + regex)

Scripts em /Users/guto/ufo/scripts/.