ML serialization security default
PyTorch torch.load weights_only default 2.6 serialization security documentation
Only resume-from-checkpoint fails; fresh training still works.
Agent Quick Fix
Repair against the current PyTorch contract, then keep the change narrow and source-backed.
Product: PyTorch
Current-contract area: Distributed checkpoint wrapper calls torch.load without explicit policy
Likely root cause: Only resume-from-checkpoint fails; fresh training still works.
Repair direction: Verifier checks the narrow safe allow-list or weights-only migration and rejects a global unsafe load. Reject global unsafe deserialization and verify the intended tensor values.
Symptom
Only resume-from-checkpoint fails; fresh training still works.
Why This Happens
The checkpoint is locally opaque and the loader call is unchanged; only the library's new security default explains why trusted custom objects no longer deserialize.
Common Wrong Fixes
- Changing local code without checking the current external contract.
- Retrying the same install, build, or API call with no version/source change.
- Applying a broad unsafe bypass when a narrow compatibility fix is available.
Codex Search Keywords
These are the search terms observed in a neutral Codex validation run for this failure shape.
PyTorch torch.load weights_only default 2.6 serialization security documentation
site:pytorch.org torch.load weights_only True default 2.6
https://pytorch.org/docs/stable/generated/torch.load.html
https://docs.pytorch.org/docs/2.12/generated/torch.load.html