disclosure-bureau/scripts/synthesize/run_reading_parallel.sh
Luiz Gustavo 55cac8a395
Some checks failed
CI / Web — typecheck + lint + build (push) Failing after 1m30s
CI / Scripts — Python smoke (push) Failing after 32s
CI / Web — npm audit (push) Failing after 37s
W0+W1+W1.2: security hardening, observability, autocomplete, glitchtip, forgejo CI
W0 — security hardening (5 fixes verified live on disclosure.top)
- middleware: gate /api/admin/* same as /admin/* (F1)
- imgproxy: tighten LOCAL_FILESYSTEM_ROOT from / to /var/lib/storage (F2)
- studio: real basic-auth label (bcrypt hash, middleware reference) (F3)
- relations: ENABLE ROW LEVEL SECURITY + public SELECT policy (F4)
- migration 0003: fold is_searchable + hybrid_search update into canonical (TD#2)

W1 — observability + resilience + autocomplete
- studio: HOSTNAME=0.0.0.0 so Next.js binds on loopback for healthcheck
- compose: PG_POOL_MAX=20, CLAUDE_CODE_OAUTH_TOKEN gated by separate env
- claude-code.ts: subprocess timeout configurable (CLAUDE_CODE_TIMEOUT_MS)
- openrouter.ts: retry with exponential backoff + Retry-After + in-memory
  circuit breaker (promotes FALLBACK after CB_THRESHOLD failures)
- lib/logger.ts: pino logger (NDJSON prod / pretty dev) + withRequest helper
- middleware: mints correlation_id, stamps x-correlation-id response header,
  emits structured http_request log per /api/* call
- messages/route.ts: switch to structured logger
- 60_meili_index.py: push documents + chunks into Meilisearch
- /api/search/autocomplete: parallel meili search (docs + chunks), 5-8ms p50
- search-autocomplete.tsx: debounced dropdown wired into search-panel

W1.2 — Glitchtip + Forgejo self-hosted
- compose: glitchtip-redis + glitchtip-web + glitchtip-worker (v4.2)
- compose: forgejo + forgejo-runner (server v9, runner v6) with group_add=988
- @sentry/nextjs SDK wired (instrumentation.ts + sentry.{client,server}.config.ts)
- /api/admin/throw smoke endpoint (gated by W0-F1 middleware)
- Synthetic event ingestion verified at glitchtip.disclosure.top
- forgejo.disclosure.top up, repo discadmin/disclosure-bureau created,
  runner registered (labels: ubuntu-latest, docker)
- .forgejo/workflows/ci.yml: typecheck + lint + build + npm audit + python
  syntax + compose validation

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:18:42 -03:00

69 lines
1.9 KiB
Bash
Executable file

#!/usr/bin/env bash
# Generate the clean LLM reading version for every document, in parallel.
#
# - One doc per `claude -p` (Sonnet) via 40_reading_version.py
# - Skips docs that already have reading.md (idempotent — safe to re-run)
# - mkdir-based per-doc lock prevents two workers racing the same doc
# - WORKERS parallel workers (default 2)
#
# Run:
# ./run_reading_parallel.sh # all docs, 2 workers
# WORKERS=3 ./run_reading_parallel.sh # 3 workers
# ./run_reading_parallel.sh DOC1 DOC2 # specific docs only
set -uo pipefail
UFO="/Users/guto/ufo"
RAW="$UFO/raw"
GEN="$UFO/scripts/synthesize/40_reading_version.py"
WORKERS="${WORKERS:-2}"
if [ "$#" -gt 0 ]; then
DOCS=("$@")
else
DOCS=()
for d in "$RAW"/*--subagent; do
[ -f "$d/_index.json" ] || continue
DOCS+=("$(basename "$d" | sed 's/--subagent$//')")
done
fi
echo "=== reading-version generator ==="
echo " docs queued: ${#DOCS[@]}"
echo " workers: $WORKERS"
echo ""
process_one() {
local doc_id="$1"
local sub="$RAW/$doc_id--subagent"
local out="$sub/reading.md"
local log="$sub/_reading.log"
local lock="$sub/.reading.lock"
if [ -f "$out" ]; then
echo "[SKIP] $doc_id (already has reading.md)"
return 0
fi
if ! mkdir "$lock" 2>/dev/null; then
echo "[LOCK] $doc_id (another worker)"
return 0
fi
trap "rmdir '$lock' 2>/dev/null || true" EXIT
local t0=$(date +%s)
echo "[BEGIN] $doc_id"
if python3 "$GEN" "$doc_id" > "$log" 2>&1; then
echo "[OK] $doc_id ($(($(date +%s) - t0))s)"
else
echo "[FAIL] $doc_id ($(($(date +%s) - t0))s) — see $log"
fi
rmdir "$lock" 2>/dev/null || true
trap - EXIT
}
export -f process_one
export RAW GEN
printf '%s\n' "${DOCS[@]}" | xargs -n 1 -P "$WORKERS" -I {} bash -c 'process_one "$@"' _ {}
echo ""
echo "=== Done. reading.md count: ==="
ls "$RAW"/*--subagent/reading.md 2>/dev/null | wc -l