fix: keep _index.json total_pages in sync after recovering pages
The reprocess pass added chunks for pages beyond the original total_pages but never updated the field, so doc-page navigation thought docs ended early (jumped to next document mid-doc) and the page counter was wrong. Now bump total_pages to the real max chunk page on each integration. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
fe19bb9c57
commit
ebc6fa41e9
1 changed files with 4 additions and 0 deletions
|
|
@ -321,6 +321,10 @@ def process_one_page(doc_id: str, page_num: int) -> tuple[bool, int]:
|
|||
except Exception as e:
|
||||
print(f" [ERR ] {doc_id} p{page_num:03d} — integrate: {e}", flush=True)
|
||||
return (False, 0)
|
||||
# Keep total_pages in sync with the real max page (recovered pages extend it)
|
||||
max_page = max((c.get("page", 0) for c in idx.get("chunks") or []), default=0)
|
||||
if max_page > idx.get("total_pages", 0):
|
||||
idx["total_pages"] = max_page
|
||||
idx_path.write_text(json.dumps(idx, indent=2, ensure_ascii=False), encoding="utf-8")
|
||||
print(f" [OK ] {doc_id} p{page_num:03d} — {n} chunks", flush=True)
|
||||
return (True, n)
|
||||
|
|
|
|||
Loading…
Reference in a new issue